articleDec 1, 2015Closed access

Calibrating Probability with Undersampling for Unbalanced Classification

Université Libre de Bruxelles · Médecins du Monde · +1 more institution

Indexed incrossref

Abstract

Under sampling is a popular technique for unbalanced datasets to reduce the skew in class distributions. However, it is well-known that under sampling one class modifies the priors of the training set and consequently biases the posterior probabilities of a classifier. In this paper, we study analytically and experimentally how under sampling affects the posterior probability of a machine learning model. We formalize the problem of under sampling and explore the relationship between conditional probability in the presence and absence of under sampling. Although the bias due to under sampling does not affect the ranking order returned by the posterior probability, it significantly impacts the classification…

Citation impact

587
total citations
FWCI
22.26
Percentile
100%
References
31
Citations per year

Authors

4

Topics & keywords

Keywords
  • Undersampling
  • Posterior probability
  • Prior probability
  • Skew
  • Computer science
  • Sampling (signal processing)
  • Artificial intelligence
  • Bayesian probability
No related works found for this paper.

Funding