Abstract

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative…

Citation impact

27,031
total citations
FWCI
127.42
Percentile
100%
References
28
Citations per year

Authors

3

Topics & keywords

Keywords
  • Latent Dirichlet allocation
  • Computer science
  • Topic model
  • Hierarchical Dirichlet process
  • Dirichlet distribution
  • Inference
  • Probabilistic logic
  • Mixture model
No related works found for this paper.