Test Collection Based Evaluation of Information Retrieval Systems

Mark Sanderson

doi:10.1561/1500000009

Foundations and Trends® in Information Retrieval > Vol 4 > Issue 4

Test Collection Based Evaluation of Information Retrieval Systems

By Mark Sanderson, The Information School, University of Sheffield, UK, m.sanderson@shef.ac.uk

Suggested Citation

Mark Sanderson (2010), "Test Collection Based Evaluation of Information Retrieval Systems", Foundations and Trends® in Information Retrieval: Vol. 4: No. 4, pp 247-375. http://dx.doi.org/10.1561/1500000009

Publication Date: 22 Jun 2010

Subjects

Evaluation issues and test collections for IR

Journal details

Download article

In this article:

Abstract

Use of test collections and evaluation measures to assess the effectiveness of information retrieval systems has its origins in work dating back to the early 1950s. Across the nearly 60 years since that work started, use of test collections is a de facto standard of evaluation. This monograph surveys the research conducted and explains the methods and measures devised for evaluation of retrieval systems, including a detailed look at the use of statistical significance testing in retrieval experimentation. This monograph reviews more recent examinations of the validity of the test collection approach and evaluation measures as well as outlining trends in current research exploiting query logs and live labs. At its core, the modern-day test collection is little different from the structures that the pioneering researchers in the 1950s and 1960s conceived of. This tutorial and review shows that despite its age, this long-standing evaluation method is still a highly valued tool for retrieval research.

DOI:10.1561/1500000009

Book details

ISBN: 978-1-60198-361-9

140 pp. $125.00

Buy E-book (.pdf)

Table of contents:

1: Introduction

2: The Initial Development of Test Collections

3: TREC and its Ad Hoc Track

4: Post Ad Hoc Collections and Measures

5: Beyond the Mean: Comparison and Significance

6: Examining the Test Collection Methodologies and Measures

7: Alternate Needs and Data Sources for Evaluation

8: Conclusions

Acknowledgements

References

Test Collection Based Evaluation of Information Retrieval Systems

Use of test collections and evaluation measures to assess the effectiveness of information retrieval systems has its origins in work dating back to the early 1950s. Across the nearly 60 years since that work started, use of test collections is a de facto standard of evaluation. Test Collection Based Evaluation of Information Retrieval Systems surveys the research conducted and explains the methods and measures devised for evaluation of retrieval systems, including a detailed look at the use of statistical significance testing in retrieval experimentation. Test Collection Based Evaluation of Information Retrieval Systems reviews more recent examinations of the validity of the test collection approach and evaluation measures as well as outlining trends in current research exploiting query logs and live labs. At its core, the modern day test collection is little different from the structures that the pioneering researchers in the 1950s and '60s conceived of. Test Collection Based Evaluation of Information Retrieval Systems shows that despite its age, this long standing evaluation method is still a highly valued tool for retrieval research.

1 Introduction
2 The Initial Development of Test Collections
3 TREC and Its Ad Hoc Track
4 Post Ad Hoc Collections and Measures
5 Beyond the Mean: Comparison and Significance
6 Examining the Test Collection Methodologies and Measures
7 Alternate Needs and Data Sources for Evaluation
8 Conclusions
Acknowledgments
References
Index

Test Collection Based Evaluation of Information Retrieval Systems

Free Preview:

Share

Journal details

Abstract

Book details

Test Collection Based Evaluation of Information Retrieval Systems