Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results

Mohammed, Roweida; Rawashdeh, Jumanah; Abdullah, Malak

doi:10.1109/icics49469.2020.239556

articleApr 1, 2020Closed access

Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results

RMRoweida Mohammed JRJumanah Rawashdeh MAMalak Abdullah

Jordan University of Science and Technology

Indexed incrossref

Abstract

Data imbalance in Machine Learning refers to an unequal distribution of classes within a dataset. This issue is encountered mostly in classification tasks in which the distribution of classes or labels in a given dataset is not uniform. The straightforward method to solve this problem is the resampling method by adding records to the minority class or deleting ones from the majority class. In this paper, we have experimented with the two resampling widely adopted techniques: oversampling and undersampling. In order to explore both techniques, we have chosen a public imbalanced dataset from kaggle website Santander Customer Transaction Prediction and have applied a group of well-known machine learning…

Citation impact

670

total citations

FWCI: 36.45
Percentile: 100%
References: 34

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Undersampling
Oversampling
Resampling
Computer science
Machine learning
Artificial intelligence
Class (philosophy)
Data mining

No related works found for this paper.