Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.
Learning to Rank for Information Retrieval is an introduction to the field of learning to rank, a hot research topic in information retrieval and machine learning. It categorizes the state-of-the-art learning-to-rank algorithms into three approaches from a unified machine learning perspective, describes the loss functions and learning mechanisms in different approaches, reveals their relationships and differences, shows their empirical performances on real IR applications, and discusses their theoretical properties such as generalization ability. As a tutorial, Learning to Rank for Information Retrieval helps people find the answers to the following critical questions: To what respect are learning-to-rank algorithms similar and in which aspects do they differ? What are the strengths and weaknesses of each algorithm? Which learning-to-rank algorithm empirically performs the best? Is ranking a new machine learning problem? What are the unique theoretical issues for ranking as compared to classification and regression? Learning to Rank for Information Retrieval is both a guide for beginners who are embarking on research in this area, and a useful reference for established researchers and practitioners