Foundations and Trends® in Information Retrieval > Vol 11 > Issue 2-3

Applications of Topic Models

By Jordan Boyd-Graber, University of Maryland, USA, jbg@umiacs.umd.edu | Yuening Hu, Google, Inc., USA, ynhu@google.com | David Mimno, Cornell Universit, USA, mimno@cornell.edu

 
Suggested Citation
Jordan Boyd-Graber, Yuening Hu and David Mimno (2017), "Applications of Topic Models", Foundations and Trends® in Information Retrieval: Vol. 11: No. 2-3, pp 143-296. http://dx.doi.org/10.1561/1500000030

Publication Date: 20 Jul 2017
© 2017 J. Boyd-Graber, Y. Hu and D. Mimno
 
Subjects
Design and Evaluation,  Information visualization,  Applications of IR,  Information categorization and clustering,  Natural language processing for IR,  Summarization,  Clustering,  Bayesian learning,  Dimensionality reduction,  Markov chain Monte Carlo,  Variational inference,  Visualization,  Data Mining
 

Free Preview:

Download extract

Share

Download article
In this article:
1. The What and Wherefore of Topic Models
2. Ad-hoc Information Retrieval
3. Evaluation and Interpretation
4. Historical Documents
5. Understanding Scientific Publications
6. Fiction and Literature
7. Computational Social Science
8. Multilingual Data and Machine Translation
9. Building a Topic Model
10. Conclusion
References

Abstract

How can a single person understand what’s going on in a collection of millions of documents? This is an increasingly common problem: sifting through an organization’s e-mails, understanding a decade worth of newspapers, or characterizing a scientific field’s research. Topic models are a statistical framework that help users understand large document collections: not just to find individual documents but to understand the general themes present in the collection. This survey describes the recent academic and industrial applications of topic models with the goal of launching a young researcher capable of building their own applications of topic models. In addition to topic models’ effective application to traditional problems like information retrieval, visualization, statistical inference, multilingual modeling, and linguistic understanding, this survey also reviews topic models’ ability to unlock large text collections for qualitative analysis. We review their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts.

DOI:10.1561/1500000030
ISBN: 978-1-68083-308-9
174 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-68083-309-6
174 pp. $260.00
Buy E-book (.pdf)
Table of contents:
1. The What and Wherefore of Topic Models
2. Ad-hoc Information Retrieval
3. Evaluation and Interpretation
4. Historical Documents
5. Understanding Scientific Publications
6. Fiction and Literature
7. Multilingual Data and Machine Translation
8. Multilingual Data and Machine Translation
9. Building a Topic Model
10. Conclusion
References

Applications of Topic Models

How can a single person understand what’s going on in a collection of millions of documents? This is an increasingly widespread problem: sifting through an organization’s e-mails, understanding a decade worth of newspapers, or characterizing a scientific field’s research. This monograph explores the ways that humans and computers make sense of document collections through tools called topic models. Topic models are a statistical framework that help users understand large document collections; not just to find individual documents but to understand the general themes present in the collection.

Applications of Topic Models describes the recent academic and industrial applications of topic models. In addition to topic models’ effective application to traditional problems like information retrieval, visualization, statistical inference, multilingual modeling, and linguistic understanding, Applications of Topic Models also reviews topic models’ ability to unlock large text collections for qualitative analysis. It reviews their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts.

Applications of Topic Models is aimed at the reader with some knowledge of document processing, basic understanding of some probability, and interested in many application domains. It discusses the information needs of each application area, and how those specific needs affect models, curation procedures, and interpretations. By the end of the monograph, it is hoped that readers will be excited enough to attempt to embark on building their own topic models. It should also be of interest to topic model experts as the coverage of diverse applications may expose models and approaches they had not seen before.

 
INR-030