SMOTE: Synthetic Minority Over-sampling Technique
University of Notre Dame · Sandia National Laboratories California · +1 more institution
Abstract
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ``normal'' examples with only a small percentage of ``abnormal'' or ``interesting'' examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the…
Citation impact
- FWCI
- 39.22
- Percentile
- 100%
- References
- 43
Authors
4- NVNitesh V. ChawlaCorresponding
University of Notre Dame, Sandia National Laboratories California, University of South Florida
- KWKevin W. Bowyer
University of Notre Dame, Sandia National Laboratories California, University of South Florida
- LHLawrence Hall
University of Notre Dame, Sandia National Laboratories California, University of South Florida
- WPW. Philip Kegelmeyer
University of Notre Dame, Sandia National Laboratories California, University of South Florida
Topics & keywords
- Oversampling
- Classifier (UML)
- Naive Bayes classifier
- Artificial intelligence
- Receiver operating characteristic
- Computer science
- Pattern recognition (psychology)
- Prior probability
- Reduced inequalities