A biterm topic model for short texts
Institute of Computing Technology · Chinese Academy of Sciences
Abstract
Uncovering the topics within short texts, such as tweets and instant messages, has become an important task for many content analysis applications. However, directly applying conventional topic models (e.g. LDA and PLSA) on such short texts may not work well. The fundamental reason lies in that conventional topic models implicitly capture the document-level word co-occurrence patterns to reveal topics, and thus suffer from the severe data sparsity in short documents. In this paper, we propose a novel way for modeling topics in short texts, referred as biterm topic model (BTM). Specifically, in BTM we learn the topics by directly modeling the generation of word co-occurrence patterns (i.e. biterms) in the whole…
Citation impact
- FWCI
- 69.18
- Percentile
- 100%
- References
- 40
Authors
4- XYXiaohui YanCorresponding
Institute of Computing Technology, Chinese Academy of Sciences
- JGJiafeng Guo
Institute of Computing Technology, Chinese Academy of Sciences
- YLYanyan Lan
Chinese Academy of Sciences, Institute of Computing Technology
- XCXueqi Cheng
Institute of Computing Technology, Chinese Academy of Sciences
Topics & keywords
- Topic model
- Computer science
- Generality
- Word (group theory)
- Natural language processing
- Task (project management)
- Artificial intelligence
- Information retrieval
- Quality Education