articleMay 13, 2013Closed access

A biterm topic model for short texts

Institute of Computing Technology · Chinese Academy of Sciences

Indexed incrossref

Abstract

Uncovering the topics within short texts, such as tweets and instant messages, has become an important task for many content analysis applications. However, directly applying conventional topic models (e.g. LDA and PLSA) on such short texts may not work well. The fundamental reason lies in that conventional topic models implicitly capture the document-level word co-occurrence patterns to reveal topics, and thus suffer from the severe data sparsity in short documents. In this paper, we propose a novel way for modeling topics in short texts, referred as biterm topic model (BTM). Specifically, in BTM we learn the topics by directly modeling the generation of word co-occurrence patterns (i.e. biterms) in the whole…

Citation impact

1,112
total citations
FWCI
69.18
Percentile
100%
References
40
Citations per year

Authors

4

Topics & keywords

Keywords
  • Topic model
  • Computer science
  • Generality
  • Word (group theory)
  • Natural language processing
  • Task (project management)
  • Artificial intelligence
  • Information retrieval
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.