Finding scientific topics

University of California, Irvine · Massachusetts Institute of Technology · +1 more institution

PubMed
Indexed incrossrefpubmed

Abstract

A first step in identifying the content of a document is determining which topics that document addresses. We describe a generative model for documents, introduced by Blei, Ng, and Jordan [Blei, D. M., Ng, A. Y. & Jordan, M. I. (2003) J. Machine Learn. Res. 3, 993-1022], in which each document is generated by choosing a distribution over topics and then choosing each word in the document from a topic selected according to this distribution. We then present a Markov chain Monte Carlo algorithm for inference in this model. We use this algorithm to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics. We show that the extracted topics capture meaningful structure in the…

Citation impact

5,985
total citations
FWCI
55.50
Percentile
100%
References
12
Citations per year

Authors

2

Topics & keywords

Keywords
  • Computer science
  • Markov chain Monte Carlo
  • Inference
  • Generative model
  • Bayesian inference
  • Bayesian probability
  • Selection (genetic algorithm)
  • Model selection
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.