Foundations and Trends® in Information Retrieval > Vol 16 > Issue 3

Pre-training Methods in Information Retrieval

By Yixing Fan, ICT, CAS, China, fanyixing@ict.ac.cn | Xiaohui Xie, Tsinghua University, China, xiexiaohui@mail.tsinghua.edu.cn | Yinqiong Cai, ICT, CAS, China, caiyinqiong18s@ict.ac.cn | Jia Chen, Tsinghua University, China, chenjia0831@gmail.com | Xinyu Ma, ICT, CAS, China, maxinyu17g@ict.ac.cn | Xiangsheng Li, Tsinghua University, China, lixsh6@gmail.com | Ruqing Zhang, ICT, CAS, China, zhangruqing@ict.ac.cn | Jiafeng Guo, ICT, CAS, China, guojiafeng@ict.ac.cn

 
Suggested Citation
Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang and Jiafeng Guo (2022), "Pre-training Methods in Information Retrieval", Foundations and Trends® in Information Retrieval: Vol. 16: No. 3, pp 178-317. http://dx.doi.org/10.1561/1500000100

Publication Date: 18 Aug 2022
© 2022 Y. Fan et al.
 
Subjects
Architectures for IR,  Formal models and language models for IR,  Natural language processing for IR,  Web search
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Background
3. Pre-training Methods Applied in the Retrieval Component
4. Pre-training Methods Applied in the Re-ranking Component
5. Pre-training Methods Applied in Other Components
6. Pre-training Methods Designed for IR
7. Resources of Pre-training Methods in IR
8. Challenges and Future Work
9. Conclusion
Acknowledgements
References

Abstract

The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to user’s information need. In recent years, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of IR. Recently, a large number of works, which are dedicated to the application of PTMs in IR, have been introduced to promote the retrieval performance. Considering the rapid progress of this direction, this survey aims to provide a systematic review of pre-training methods in IR. To be specific, we present an overview of PTMs applied in different components of an IR system, including the retrieval component, the re-ranking component, and other components. In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Moreover, we discuss some open challenges and highlight several promising directions, with the hope of inspiring and facilitating more works on these topics for future research.

DOI:10.1561/1500000100
ISBN: 978-1-63828-062-0
156 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-63828-063-7
156 pp. $145.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Background
3. Pre-training Methods Applied in the Retrieval Component
4. Pre-training Methods Applied in the Re-ranking Component
5. Pre-training Methods Applied in Other Components
6. Pre-training Methods Designed for IR
7. Resources of Pre-training Methods in IR
8. Challenges and Future Work
9. Conclusion
Acknowledgements
References

Pre-training Methods in Information Retrieval

Information retrieval (IR) is a fundamental task in many real-world applications such as Web search, question answering systems, and digital libraries. The core of IR is to identify information resources relevant to user’s information need. Since there might be more than one relevant resource, the returned result is often organized as a ranked list of documents according to their relevance degree against the information need. The ranking property of IR makes it different from other tasks, and researchers have devoted substantial efforts to develop a variety of ranking models in IR.

In recent years, the resurgence of deep learning has greatly advanced this field and led to a hot topic named NeuIR (neural information retrieval), especially the paradigm of pre-training methods (PTMs). Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data that are beneficial to the ranking task of IR. Considering the rapid progress of this direction, this survey provides a systematic review of PTMs in IR. The authors present an overview of PTMs applied in different components of an IR system, including the retrieval component and the re-ranking component. In addition, they introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards. Lastly, they discuss some open challenges and highlight several promising directions with the hope of inspiring and facilitating more works on these topics for future research.

 
INR-100