A study of the behavior of several methods for balancing machine learning training data
Brazilian Society of Computational and Applied Mathematics
Abstract
There are several aspects that might influence the performance achieved by existing learning systems. It has been reported that one of these aspects is related to class imbalance in which examples in training data belonging to one class heavily outnumber the examples in the other class. In this situation, which is found in real world data describing an infrequent but important event, the learning system may have difficulties to learn the concept related to the minority class. In this work we perform a broad experimental evaluation involving ten methods, three of them proposed by the authors, to deal with the class imbalance problem in thirteen UCI data sets. Our experiments provide evidence that class…
Citation impact
- FWCI
- 37.46
- Percentile
- 100%
- References
- 26
Authors
3Topics & keywords
- Computer science
- Class (philosophy)
- Machine learning
- Artificial intelligence
- Sampling (signal processing)
- Simple random sample
- Event (particle physics)
- Data mining