articleIEEE Transactions on Knowledge and Data EngineeringMar 26, 2014Closed access

BTM: Topic Modeling over Short Texts

Chinese Academy of Sciences · Institute of Computing Technology

Indexed incrossref

Abstract

Short texts are popular on today's web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e.,…

Citation impact

554
total citations
FWCI
43.55
Percentile
100%
References
60
Citations per year

Authors

4

Topics & keywords

Keywords
  • Topic model
  • Latent Dirichlet allocation
  • Computer science
  • Inference
  • Probabilistic latent semantic analysis
  • Artificial intelligence
  • Natural language processing
  • Word (group theory)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding