AWBI-LSTM classifier with hybrid ADASYN-GAN oversampling and optimized FCM undersampling for imbalanced data

Sangeetha Palanisamy; Chitra Duraisamy

doi:10.54517/jbrha8283

Publication Frequency

Quarterly since 2025

Journal Articles

Search scope

Journal Center

Asia Pacific Academy of Science Pte. Ltd. (APACSCI) specializes in international journal publishing. APACSCI adopts the open access publishing model and provides an important communication bridge for academic groups whose interest fields include engineering, technology, medicine, computer, mathematics, agriculture and forestry, and environment.

more

Volume Arrangement

2025

AWBI-LSTM classifier with hybrid ADASYN-GAN oversampling and optimized FCM undersampling for imbalanced data

Sangeetha Palanisamy, Chitra Duraisamy

Article ID: 8283
Vol 39, Issue 4, 2025

DOI: https://doi.org/10.54517/jbrha8283

Download PDF

Abstract

When faced with imbalanced data, classification techniques in the area of artificial intelligence have a tendency to Favor the majority class samples, which lowers the recognition rates of minority class samples. This problem is solved by undersampling, which reduces the quantity of majority class samples while trying to restore the original data distribution when the dataset is acquired. The initial imbalanced dataset and its classification accuracy as a whole are strongly impacted by the constraints of the clustering-based undersampling techniques utilized today. To solve these issues, in this research work, initially the highly imbalanced dataset is pre-processed using Non-Negative Matrix Factorization (NMF) Algorithm. Next, Hybrid Extremely Randomized Trees (HERT), an efficient ensemble learning-based method, is employed to quickly choose the features. Afterwards, to solve class imbalance issue, Generative Adversarial Network (GAN)-based oversampling is suggested. This method has shown exceptional capacity to solve class imbalance as it may detect the genuine data distribution of minority class samples and produce new samples. By selecting useful instances from each cluster and avoiding information loss, the Fuzzy C means (FCM) clustering system is suggested for the undersampling method. Here Combined form of Fuzzy C means clustering for majority class and Adasyn-GAN centred over sampling for minority class are together to produce better results. Finally, the sampled dataset has undergone classification using Adaptive Weight Bi-Directional Long Short-Term Memory (AWBi-LSTM) classifier. Three huge, unbalanced data sets are applied to assess the suggested algorithm. The suggested system’s efficiency was compared to those of cutting-edge machine learning (ML) techniques like XG boost and random forest. The suggested method’s effectiveness is demonstrated by the performance assessment with regard to accuracy, recall, precision, and F1-score. Furthermore, the suggested plan requires less training time than cutting-edge methods.

Keywords

big data platform; feature selection; imbalanced data classification; neural network; clustering

References

[1] Mohamed AE. Comparative study of four supervised machine learning techniques for classification. International Journal of Applied Science and Technology, 2017; 7 (2).

[2] Trifonov R, Gotseva D, Angelov V. Binary classification algorithms. International Journal of Development Research, 2017; 7 (11): 16873-16879.

[3] Aly M. Survey on multiclass classification methods. Neural Netw, 2005; 19(1-9): 2.

[4] de Carvalho AC, Freitas AA. A tutorial on multi-label classification techniques. Foundations of computational intelligence volume, 2009; 5: 177-195.

[5] Chawla NV, Japkowicz N, Kotcz A. Special issue on learning from imbalanced data sets. Association for Computing Machinery Special Interest Group on Knowledge Discovery and Data explorations newsletter, 2004; 6(1): 1-6.

[6] Sohony I, Pratap R, Nambiar U. Ensemble learning for credit card fraud detection. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data;2018. pp. 289-294.

[7] Manek AS, Samhitha MR, Shruthy S, Bhat VH, Shenoy PD et al. RePID-OK: spam detection using repetitive preprocessing. In IEEE 2013 International Conference on Cloud & Ubiquitous Computing & Emerging Technologies; 2013. pp. 144-149.

[8] Gupta S, Gupta MK. A comprehensive data‐level investigation of cancer diagnosis on imbalanced data. Computational Intelligence, 2022; 38 (1): 156-186.

[9] Padurariu C, Breaban ME. Dealing with data imbalance in text classification. Procedia Computer Science, 2019; 159: 736-745.

[10] Estabrooks A, Jo T, Japkowicz N. A multiple resampling method for learning from imbalanced datasets. Computational intelligence, 2004; 20 (1): 18-36.

[11] He H, Garcia EA. Learning from Imbalanced Data IEEE Transactions on Knowledge and Data Engineering, 2009; 21 (9).

[12] Weiss GM. Mining with Rarity: a Unifying Framework, Association for Computing Machinery’s (ACM) Special Interest Group (SIG) on Knowledge Discovery and Data Mining Explorations Newsletter, 2004; 6 (1): 7-19.

[13] Visa S, Ralescu A. Issues in mining imbalanced data sets-a review paper. In Proceedings of the sixteen midwest artificial intelligence and cognitive science conference; 2005. pp. 67-73.

[14] Lópe V, Fernández A, García S, Palade V, Herrera F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information sciences 2013; 250: 113-141.

[15] Chawla NV. Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook, 2009: 875-886.

[16] Tsai, C. F., Lin, W. C., Hu, Y. H., & Yao, G. T. (2019). Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Information Sciences, 477, 47-54.

[17] Zhu, M.; Xia, J.; Jin, X.; Yan, M.; Cai, G.; Yan, J.; Ning, G.: Class weights random forest algorithm for processing class imbalanced medical data. IEEE Access 6, 4641–4652 (2018)

[18] Li, J.; Fong, S.; Yuan, M.; Wong, R.K.: Adaptive multi-objective swarm crossover optimization for imbalanced data classifcation. In: International Conference on Advanced Data Mining and Applications (pp. 374–390). Springer, Cham (2016)

[19] Li, M.; Xiong, A.; Wang, L.; Deng, S.; Ye, J.: Aco resampling: enhancing the performance of oversampling methods for class imbalance classifcation. Knowl.-Based Syst. 196, 105818 (2020).

[20] Febriantono, M.A.; Pramono, S.H.; Rahmadwati, R.; Naghdy, G.: Classifcation of multiclass imbalanced data using cost-sensitive decision tree C50. IAES Int. J. Artif. Intell. 9(1), 65 (2020)

[21] Babu, M.C.; Pushpa, S.: Genetic algorithm-based PCA classifcation for imbalanced dataset. In: Intelligent Computing in Engineering (pp. 541–552). Springer, Singapore (2020).

[22] Ji, S., Zhang, Z., Ying, S., Wang, L., Zhao, X., & Gao, Y. (2020). Kullback–Leibler divergence metric learning. IEEE transactions on cybernetics, 52(4), 2047-2058.

[23] Kalantar, B., Ueda, N., Idrees, M. O., Janizadeh, S., Ahmadi, K., & Shabani, F. (2020). Forest fire susceptibility prediction based on machine learning models with resampling algorithms on remote sensing data. Remote Sensing, 12(22), 3682.

[24] Bezdek, J.C., Hathaway, R.J., 1994. Optimization of fuzzy clustering criteria using genetic algorithms. In: Evolutionary Computation, 1994. IEEE World Congress on Computational Intelligence., Proceedings of the First IEEE Conference on. IEEE, pp. 589–594

[25] He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). IEEE.

Supporting Agencies

This Study has not received any financial support or funding from external sources.

This work is licensed under a Creative Commons Attribution 4.0 International License.

This site is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

Editor-in-Chief

Dr. Fabio Malavasi

Medical Genetics, University of Torino Medical School, Italy

Co-Editor-in-Chief

Prof. Marcello Iriti

Department of Biomedical, Surgical and Dental Sciences, University of Milan, Italy

Indexing & Archiving

News & Announcements

2025-02-01

Notice of Change in Publication Schedule

In addition to the first issue that has already been published by the original publisher, there will be four more issues released this year, with scheduled publication dates in March, June, September, and December respectively.

2025-01-21

Notice: Ownership Transfer of the Journal

Starting from Volume 39 Issue 2 (2025), the ownership of Journal of Biological Regulators and Homeostatic Agents (ISSN: 0393-974X (P); 1724-6083 (O)) will be transferred from Biolife Sas to Asia Pacific Academy of Science Pte. Ltd. As of 21 January 2025, authors should make submissions to the new journal system and follow the author guidelines. Asia Pacific Academy of Science Pte. Ltd. will take over the publication of manuscripts being processed.

More Announcements...