Foundations and Trends® in Information Retrieval > Vol 12 > Issue 4-5

Efficient Query Processing for Scalable Web Search

Nicola Tonellotto, National Research Council of Italy, Italy, nicola.tonellotto@isti.cnr.it Craig Macdonald, University of Glasgow, UK, craig.macdonald@glasgow.ac.uk Iadh Ounis, University of Glasgow, UK, iadh.ounis@glasgow.ac.uk
 
Suggested Citation
Nicola Tonellotto, Craig Macdonald and Iadh Ounis (2018), "Efficient Query Processing for Scalable Web Search", Foundations and Trends® in Information Retrieval: Vol. 12: No. 4-5, pp 319-500. http://dx.doi.org/10.1561/1500000057

Published: 23 Dec 2018
© 2018 N. Tonellotto, C. Macdonald and I. Ounis
 
Subjects
Architectures for IR,  Performance issues for IR systems,  Web search
 

Free Preview:

Article Help

Share

Download article
In this article:
1. Introduction
2. Modern Infrastructure Foundations
3. Dynamic Pruning Query Processing
4. Query Efficiency Prediction for Dynamic Pruning
5. Impact-Sorted Indexes
6. Learning-to-Rank and Cascades
7. Open Directions and Conclusions
Acknowledgements
References

Abstract

Search engines are exceptionally important tools for accessing information in today’s world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-to-rank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware and software architectures.

DOI:10.1561/1500000057
ISBN: 978-1-68083-542-7
188 pp. $99.00
Buy book
 
ISBN: 978-1-68083-543-4
188 pp. $260.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Modern Infrastructure Foundations
3. Dynamic Pruning Query Processing
4. Query Efficiency Prediction for Dynamic Pruning
5. Impact-Sorted Indexes
6. Learning-to-Rank and Cascades
7. Open Directions and Conclusions
Acknowledgements
References

Efficient Query Processing for Scalable Web Search

Every day, millions of users rely on search engines to satisfy the information needs required for performing many routine tasks. The effectiveness and efficiency of a search engine are two prime goals that form a natural trade-off. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency.

This survey comprehensively reviews the foundations of search engines, from index layouts to basic query processing strategies, while also providing the latest trends in the literature in efficient query processing. It goes on to describe techniques in applying a cascading infrastructure within search systems, such as learned models obtained from learning-to-rank techniques. The survey also covers the selective application of query processing techniques to ensure that the required retrieval speed targets can be met. Finally, the authors bring the reader completely up-to-date by describing techniques for the efficient deployment of learned models in a multi-stage ranking system.

Efficient Query Processing for Scalable Web Search will be a valuable reference for researchers and developers working on cutting edge web search system design where effective and efficient search is an integral part of the design.

 
INR-057