articleAug 21, 2003Closed access

Tackling the poor assumptions of naive bayes text classifiers

Massachusetts Institute of Technology

Abstract

Naive Bayes is often used as a baseline in text classication because it is fast and easy to implement. Its severe assumptions make such eciency possible but also adversely af-fect the quality of its results. In this paper we propose simple, heuristic solutions to some of the problems with Naive Bayes classiers, ad-dressing both systemic issues as well as prob-lems that arise because text is not actually generated according to a multinomial model. We nd that our simple corrections result in a fast algorithm that is competitive with state-of-the-art text classication algorithms such as the Support Vector Machine. 1.

Citation impact

952
total citations
FWCI
27.62
Percentile
100%
References
15
Citations per year

Authors

4

Topics & keywords

Keywords
  • Naive Bayes classifier
  • Computer science
  • Machine learning
  • Support vector machine
  • Bayes' theorem
  • Artificial intelligence
  • Simple (philosophy)
  • Heuristic
UN Sustainable Development Goals
  • No poverty
No related works found for this paper.