articleJournal Of Big DataJun 17, 2024GOLD OA

Data oversampling and imbalanced datasets: an investigation of performance for machine learning and feature engineering

Prince Sultan University · Van Yüzüncü Yıl Üniversitesi · +9 more institutions

Indexed incrossrefdoaj

Abstract

Abstract The classification of imbalanced datasets is a prominent task in text mining and machine learning. The number of samples in each class is not uniformly distributed; one class contains a large number of samples while the other has a small number. Overfitting of the model occurs as a result of imbalanced datasets, resulting in poor performance. In this study, we compare different oversampling techniques like synthetic minority oversampling technique (SMOTE), support vector machine SMOTE (SVM-SMOTE), Border-line SMOTE, K-means SMOTE, and adaptive synthetic (ADASYN) oversampling to address the issue of imbalanced datasets and enhance the performance of machine learning models. Preprocessing significantly…

No related works found for this paper.

Funding