articleJan 1, 2006Closed access
Topic modeling
Indexed incrossref
Abstract
Some models of textual corpora employ text generation methods involving n-gram statistics, while others use latent topic variables inferred using the "bag-of-words" assumption, in which word order is ignored. Previously, these methods have not been combined. In this work, I explore a hierarchical generative probabilistic model that incorporates both n-gram statistics and latent topic variables by extending a unigram topic model to include properties of a hierarchical Dirichlet bigram language model. The model hyperparameters are inferred using a Gibbs EM algorithm. On two data sets, each of 150 documents, the new model exhibits better predictive accuracy than either a hierarchical Dirichlet bigram language…
Citation impact
1,065
total citations
- FWCI
- 20.51
- Percentile
- 100%
- References
- 8
Citations per year
Authors
1Topics & keywords
Topics
Keywords
- Bigram
- Language model
- Computer science
- Artificial intelligence
- Latent Dirichlet allocation
- Topic model
- n-gram
- Probabilistic latent semantic analysis
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.