Foundations and Trends® in Databases > Vol 10 > Issue 2-4

Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases

By Gerhard Weikum, Max Planck Institute for Informatics, Germany, weikum@mpi-inf.mpg.de | Xin Luna Dong, Amazon, USA, lunadong@amazon.com | Simon Razniewski, Max Planck Institute for Informatics, Germany, srazniew@mpi-inf.mpg.de | Fabian Suchanek, Telecom Paris University, France, suchanek@telecom-paris.fr

 
Suggested Citation
Gerhard Weikum, Xin Luna Dong, Simon Razniewski and Fabian Suchanek (2021), "Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases", Foundations and Trends® in Databases: Vol. 10: No. 2-4, pp 108-490. http://dx.doi.org/10.1561/1900000064

Publication Date: 12 Jul 2021
© 2021 Gerhard Weikum, Xin Luna Dong, Simon Razniewski and Fabian Suchanek
 
Subjects
Data cleaning and information extraction,  Natural language processing for IR,  Text mining,  Web search,  Graphical models,  Relational learning,  Deep learning,  Databases on the web,  Semantic web
 

Free Preview:

Download extract

Share

Download article
In this article:
1. What Is This All About
2. Foundations and Architecture
3. Knowledge Integration from Premium Sources
4. KB Construction: Entity Discovery and Typing
5. Entity Canonicalization
6. KB Construction: Attributes and Relationships
7. Open Schema Construction
8. Knowledge Base Curation
9. Case Studies
10. Wrap-Up
Acknowledgements
References

Abstract

Equipping machines with comprehensive knowledge of the world’s entities and their relationships has been a longstanding goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics.

This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. On top of this, the article discusses the automatic extraction of entity-centric properties. To support the long-term life-cycle and the quality assurance of machine knowledge, the article presents methods for constructing open schemas and for knowledge curation. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.

DOI:10.1561/1900000064
ISBN: 978-1-68083-836-7
400 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-68083-837-4
400 pp. $280.00
Buy E-book (.pdf)
Table of contents:
1. What Is This All About
2. Foundations and Architecture
3. Knowledge Integration from Premium Sources
4. KB Construction: Entity Discovery and Typing
5. Entity Canonicalization
6. KB Construction: Attributes and Relationships
7. Open Schema Construction
8. Knowledge Base Curation
9. Case Studies
10. Wrap-Up
Acknowledgements
References

Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases

Equipping machines with comprehensive knowledge of the world’s entities and their relationships has been a longstanding goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics.

This monograph surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and curating large knowledge bases from online content, with emphasis on semi-structured web pages with lists, tables etc., and unstructured text sources. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.

The intended audience is students and researchers interested in a wide spectrum of topics: from machine knowledge and data quality to machine learning and data science as well as applications in web content mining and natural language understanding. It will also be of interest to industrial practitioners working on semantic technologies for web, social media, or enterprise content.

 
DBS-064