Simple BM25 extension to multiple weighted fields
Microsoft (United States) · Microsoft Research (United Kingdom)
Abstract
This paper describes a simple way of adapting the BM25 ranking formula to deal with structured documents. In the past it has been common to compute scores for the individual fields (e.g. title and body) independently and then combine these scores (typically linearly) to arrive at a final score for the document. We highlight how this approach can lead to poor performance by breaking the carefully constructed non-linear saturation of term frequency in the BM25 function. We propose a much more intuitive alternative which weights term frequencies before the nonlinear term frequency saturation function is applied. In this scheme, a structured document with a title weight of two is mapped to an unstructured document…
Citation impact
- FWCI
- 45.24
- Percentile
- 100%
- References
- 20
Authors
3Topics & keywords
- Extension (predicate logic)
- Simple (philosophy)
- Computer science
- Algorithm
- Artificial intelligence
- Programming language
- No poverty