Foundations and Trends® in Information Retrieval > Vol 3 > Issue 4

The Probabilistic Relevance Framework: BM25 and Beyond

Stephen Robertson, Microsoft Research, UK, ser@microsoft.com Hugo Zaragoza, Yahoo! Research, Spain, hugoz@yahoo-inc.com
 
Suggested Citation
Stephen Robertson and Hugo Zaragoza (2009), "The Probabilistic Relevance Framework: BM25 and Beyond", Foundations and Trends® in Information Retrieval: Vol. 3: No. 4, pp 333-389. http://dx.doi.org/10.1561/1500000019

Published: 17 Dec 2009
© 2009 S. Robertson and H. Zaragoza
 
Subjects
Collaborative filtering and recommender systems,  Metasearch, rank aggregation and data fusion
 

Free Preview:

Article Help

Share

Download article
In this article:
1 Introduction
2 Development of the Basic Model
3 Derived Models
4 Comparison with Other Models
5 Parameter Optimisation
6 Conclusions
References

Abstract

The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970–1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.

DOI:10.1561/1500000019
ISBN: 978-1-60198-308-4
68 pp. $55.00
Buy book
 
ISBN: 978-1-60198-309-1
68 pp. $100.00
Buy E-book
Table of contents:
1: Introduction
2: Development of the basic model
3: Derived models
4: Comparison with Other Models
5: Parameter Optimisation
6: Conclusions
References

The Probabilistic Relevance Framework

The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970-80s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account structure and link-graph information. Again, this has led to one of the most successful web-search and corporate-search algorithms, BM25F. The Probabilistic Relevance Framework: BM25 and Beyond presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25, BM25F. Besides presenting a full derivation of the PRF ranking algorithms, it provides many insights about document retrieval in general, and points to many open challenges in this area. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimization for models with free parameters. The Probabilistic Relevance Framework: BM25 and Beyond is self-contained and accessible to anyone with basic knowledge of probability and inference

 
INR-019