articleJournal of Artificial Intelligence ResearchOct 1, 2003DIAMOND OA

Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction

AT&T (United States) · New York University

Indexed inarxivcrossrefdoaj

Abstract

For large, real-world inductive learning problems, the number of training examples often must be limited due to the costs associated with procuring, preparing, and storing the training examples and/or the computational costs associated with learning from them. In such circumstances, one question of practical importance is: if only n training examples can be selected, in what proportion should the classes be represented? In this article we help to answer this question by analyzing, for a fixed training-set size, the relationship between the class distribution of the training data and the performance of classification trees induced from these data. We study twenty-six data sets and, for each, determine the best…

Citation impact

926
total citations
FWCI
37.48
Percentile
100%
References
47
Citations per year

Authors

2

Topics & keywords

Keywords
  • Classifier (UML)
  • Computer science
  • Machine learning
  • Artificial intelligence
  • Training set
  • Class (philosophy)
  • Data mining
No related works found for this paper.