Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering
Prince Sultan University · Van Yüzüncü Yıl Üniversitesi · +9 more institutions
Abstract
Abstract The classification of imbalanced datasets is a prominent task in text mining and machine learning. The number of samples in each class is not uniformly distributed; one class contains a large number of samples while the other has a small number. Overfitting of the model occurs as a result of imbalanced datasets, resulting in poor performance. In this study, we compare different oversampling techniques like synthetic minority oversampling technique (SMOTE), support vector machine SMOTE (SVM-SMOTE), Border-line SMOTE, K-means SMOTE, and adaptive synthetic (ADASYN) oversampling to address the issue of imbalanced datasets and enhance the performance of machine learning models. Preprocessing significantly…
Citation impact
- FWCI
- 44.69
- Percentile
- 100%
- References
- 50
Authors
7- MMMuhammad Mujahid
Prince Sultan University
- EKErol Kına
Van Yüzüncü Yıl Üniversitesi
- FRFurqan Rustam
University College Dublin
- MGMónica Gracia Villar
Universidad Internacional, Universidad Europea del Atlántico, Centro Universitário Internacional
- ESEduardo Silva Alvarado
Ibero American University, Universidad Internacional, Universidad Europea del Atlántico, Universidad de la Romana, Ibero-American University Puebla
Topics & keywords
- Computer science
- Oversampling
- Computational Science and Engineering
- Feature (linguistics)
- Feature engineering
- Machine learning
- Artificial intelligence
- Science and engineering