articleMar 1, 2003Closed access

An extensive empirical study of feature selection metrics for text classification

Hewlett-Packard (United States)

Abstract

Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison of twelve feature selection methods (e.g. Information Gain) evaluated on a benchmark of 229 text classification problem instances that were gathered from Reuters, TREC, OHSUMED, etc. The results are analyzed from multiple goal perspectives—accuracy, F-measure, precision, and recall—since each is appropriate in different situations. The results reveal that a new feature selection metric we call ‘Bi-Normal Separation…

Citation impact

2,390
total citations
FWCI
49.63
Percentile
100%
References
15
Citations per year

Authors

1

Topics & keywords

Keywords
  • Computer science
  • Feature selection
  • Artificial intelligence
  • Margin (machine learning)
  • Machine learning
  • Benchmark (surveying)
  • Feature (linguistics)
  • Metric (unit)
No related works found for this paper.