articleACM SIGIR ForumAug 2, 2017Closed access

Probabilistic Latent Semantic Indexing

International Computer Science Institute · University of California, Berkeley

Indexed incrossref

Abstract

Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain{specific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methods as well as…

Citation impact

4,061
total citations
FWCI
449.57
Percentile
100%
References
20
Citations per year

Authors

1

Topics & keywords

Keywords
  • Computer science
  • Probabilistic latent semantic analysis
  • Probabilistic logic
  • Search engine indexing
  • Artificial intelligence
  • Generalization
  • Mathematics
No related works found for this paper.