articleACM Transactions on Information SystemsOct 1, 2002Closed access

Probabilistic models of information retrieval based on measuring the divergence from randomness

GAGianni AmatiCJCornelis J. van Rijsbergen

Fondazione "Ugo Bordoni" · University of Glasgow

Indexed incrossref

Abstract

We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed…

Citation impact

881
total citations
FWCI
24.30
Percentile
100%
References
48
Citations per year

Authors

2

Topics & keywords

Keywords
  • Normalization (sociology)
  • Divergence-from-randomness model
  • Computer science
  • Weighting
  • Term Discrimination
  • Randomness
  • Probabilistic logic
  • Term (time)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.