BTM: Topic Modeling over Short Texts
Chinese Academy of Sciences · Institute of Computing Technology
Abstract
Short texts are popular on today's web, especially with the emergence of social media. Inferring topics from large scale short texts becomes a critical but challenging task for many content analysis tasks. Conventional topic models such as latent Dirichlet allocation (LDA) and probabilistic latent semantic analysis (PLSA) learn topics from document-level word co-occurrences by modeling each document as a mixture of topics, whose inference suffers from the sparsity of word co-occurrence patterns in short texts. In this paper, we propose a novel way for short text topic modeling, referred as biterm topic model (BTM). BTM learns topics by directly modeling the generation of word co-occurrence patterns (i.e.,…
Citation impact
- FWCI
- 43.55
- Percentile
- 100%
- References
- 60
Authors
4- XCXueqi ChengCorresponding
Chinese Academy of Sciences, Institute of Computing Technology
- XYXiaohui Yan
Institute of Computing Technology, Chinese Academy of Sciences
- YLYanyan Lan
Chinese Academy of Sciences, Institute of Computing Technology
- JGJiafeng Guo
Institute of Computing Technology, Chinese Academy of Sciences
Topics & keywords
- Topic model
- Latent Dirichlet allocation
- Computer science
- Inference
- Probabilistic latent semantic analysis
- Artificial intelligence
- Natural language processing
- Word (group theory)
- Quality Education