Probabilistic models of information retrieval based on measuring the divergence from randomness
Fondazione "Ugo Bordoni" · University of Glasgow
Abstract
We introduce and create a framework for deriving probabilistic models of Information Retrieval. The models are nonparametric models of IR obtained in the language model approach. We derive term-weighting models by measuring the divergence of the actual term distribution from that obtained under a random process. Among the random processes we study the binomial distribution and Bose--Einstein statistics. We define two types of term frequency normalization for tuning term weights in the document--query matching process. The first normalization assumes that documents have the same length and measures the information gain with the observed term once it has been accepted as a good descriptor of the observed…
Citation impact
- FWCI
- 24.30
- Percentile
- 100%
- References
- 48
Authors
2- GAGianni AmatiCorresponding
Fondazione "Ugo Bordoni"
- CJCornelis J. van Rijsbergen
University of Glasgow
Topics & keywords
- Normalization (sociology)
- Divergence-from-randomness model
- Computer science
- Weighting
- Term Discrimination
- Randomness
- Probabilistic logic
- Term (time)
- Quality Education