The Journal of Web Science > Vol 4 > Issue 2

Multi-Cultural Interlinking of Web Taxonomies with ACROSS

Natalia Boldyrev, Max Planck Institute for Informatics, Saarland Informatics Campus, Germany, natalia@mpiinf.mpg.de Marc Spaniol, Université de Caen Normandie, Brazil, Gerhard Weikum, Max Planck Institute for Informatics, Saarland Informatics Campus, Germany, weikum@mpiinf.mpg.de
 
Suggested Citation
Natalia Boldyrev, Marc Spaniol and Gerhard Weikum (2018), "Multi-Cultural Interlinking of Web Taxonomies with ACROSS", The Journal of Web Science: Vol. 4: No. 2, pp 20-33. http://dx.doi.org/10.1561/106.00000012

Published: 14 Feb 2018
© 2018 N. Boldyrev, M. Spaniol and G. Weikum
 
Subjects
 

Article Help

Share

Open Access

This is published under the terms of CC BY-NC-ND 2.0.

In this article:
1. Introduction
2. Computational Model
3. Basic Methods
4. Advanced Alignment Methods
5. Seeding Strategies
6. Experimental Evaluation
7. Related Work
8. Conclusions and Future Work
References

Abstract

The Web hosts a huge variety of multi-cultural taxonomies. They encompass product catalogs of e-commerce, general-purpose knowledge bases and numerous domain-specific category systems. The enormous heterogeneity of those sources is a challenging aspect when multiple taxonomies have to be interlinked. In this paper we introduce the ACROSS system to support the alignment of independently created Web taxonomies. Each taxonomy is shaped by its unique culture, which is three-fold: categorization criteria of the taxonomy, language, and socioeconomic background. For mapping categories between different taxonomies, ACROSS harnesses instance-level features as well as distant supervision from an intermediate source like multiple Wikipedia editions. ACROSS includes a reasoning step, which is based on combinatorial optimization. In order to reduce the run time of the reasoning procedure without sacrificing quality, we study two models of user involvement. Our experiments with heterogeneous taxonomies for different domains demonstrate the viability of our approach and improvement over state-of-the-art baselines.

DOI:10.1561/106.00000012