Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

Wang, Sida; Manning, Christopher D.

articleJul 8, 2012Closed access

Baselines and Bigrams: Simple, Good Sentiment and Topic Classification

Abstract

Variants of Naive Bayes (NB) and Support Vector Machines (SVM) are often used as baseline methods for text classification, but their performance varies greatly depending on the model variant, features used and task/ dataset. We show that: (i) the inclusion of word bigram features gives consistent gains on sentiment analysis tasks; (ii) for short snippet sentiment tasks, NB actually does better than SVMs (while for longer documents the opposite result holds); (iii) a simple but novel SVM variant using NB log-count ratios as feature values consistently performs well across tasks and datasets. Based on these observations, we identify simple NB and SVM variants which outperform most published results on sentiment…

Citation impact

967

total citations

FWCI: 45.54
Percentile: 100%
References: 21

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Bigram
Computer science
Support vector machine
Word (group theory)
Artificial intelligence
Naive Bayes classifier
Simple (philosophy)
Snippet

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.