By Alexandru L. Ginsca, CEA, LIST, 91190 Gif-sur-Yvette, France, firstname.lastname@example.org | Adrian Popescu, CEA, LIST, 91190 Gif-sur-Yvette, France, email@example.com | Mihai Lupu, Vienna University of Technology, Austria, firstname.lastname@example.org
Credibility, as the general concept covering trustworthiness and expertise, but also quality and reliability, is strongly debated in philosophy, psychology, and sociology, and its adoption in computer science is therefore fraught with difficulties. Yet its importance has grown in the information access community because of two complementing factors: on one hand, it is relatively difficult to precisely point to the source of a piece of information, and on the other hand, complex algorithms, statistical machine learning, artificial intelligence, make decisions on behalf of the users, with little oversight from the users themselves. This survey presents a detailed analysis of existing credibility models from different information seeking research areas, with focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content (e.g., Web sites, blogs, tweets), but also to other modalities (videos, images, audio) and topics (e.g., health care). After an introduction placing credibility in the context of other sciences and relating it to trust, we argue for a quartic decomposition of credibility: expertise and trustworthiness, well documented in the literature and predominantly related to information source, and quality and reliability, raised to the status of equal partners because the source is often impossible to detect, and predominantly related to the content. The second half of the survey provides the reader with access points to the literature, grouped by research interests. Section 3 reviews general research directions: the factors that contribute to credibility assessment in human consumers of information; the models used to combine these factors; the methods to predict credibility. A smaller section is dedicated to informing users about the credibility learned from the data. Sections 4, 5, and 6 go further into details, with domain-specific credibility, social media credibility, and multimedia credibility, respectively. While each of them is best understood in the context of Sections 1 and 2, they can be read independently of each other. The last section of this survey addresses a topic not commonly considered under “credibility”: the credibility of the system itself, independent of the data creators. This is a topic of particular importance in domains where the user is professionally motivated and where there are no concerns about the credibility of the data (e.g. e-discovery and patent search). While there is little explicit work in this direction, we argue that this is an open research direction that is worthy of future exploration. Finally, as an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility. Overall, this review will provide the reader with an organised and comprehensive reference guide to the state of the art and the problems at hand, rather than a final answer to the question of what credibility is for computer science. Even within the relatively limited scope of an exact science, such an answer is not possible for a concept that is itself widely debated in philosophy and social sciences.
Credibility in Information Retrieval presents a detailed analysis of existing credibility models from different information seeking research areas, with a focus on the Web and its pervasive social component. It shows that there is a very rich body of work pertaining to different aspects and interpretations of credibility, particularly for different types of textual content (for example, Web sites, blogs, tweets), but also for other modalities (such as videos, images, audio) and topics (such as health care).
Credibility in Information Retrieval defines the limits of credibility with respect to digital information access systems, providing the reader with an organized and comprehensive reference guide to the state of the art and the problems at hand. As an additional help to the reader, an appendix lists the existing test collections that cater specifically to some aspect of credibility.