By Vlad Niculae, University of Amsterdam, Netherlands | Caio Corro, Université de Rennes, France | Nikita Nangia, Amazon, USA | Tsvetomila Mihaylova, Aalto University, Finland | André F. T. Martins, Instituto Superior Técnico, Portugal and Instituto de Telecomunicações, Portugal and Unbabel, Portugal
Many types of data from fields including natural language processing, computer vision, and bioinformatics are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions. However, effective training is challenging as neural networks are typically designed for continuous computation.
This text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. Our presentation relies on consistent notations for a wide range of models. As such, we reveal many new connections between latent structure learning strategies, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.
Machine learning (ML) is often employed to build predictive models for analyzing rich data, such as images, text, or sound. Most such data is governed by underlying structured representations, such as segmentations, hierarchy, or graph structure. It is common for practical ML systems to be structured as pipelines, including off-the-shelf components that produce structured representations of the input, used as features in subsequent steps of the pipeline. On the one hand, such architectures require availability of these components, or of the data to train them. Since the component may not be built with the downstream goal in mind, a disadvantage of pipelines is that they are prone to error propagation. On the other hand, they are transparent: the predicted structures can be directly inspected and used to interpret downstream predictions. In contrast, deep neural networks rival and even outperform pipelines by learning dense, continuous representations of the data, solely driven by the downstream objective.
This monograph is about neural network models that induce discrete latent structure, combining the strengths of both end-to-end and pipeline systems. In doing so, not one specific downstream application in natural language processing nor computer vision is assumed, however the presentation follows an abstract framework that allows to focus on technical aspects related to end-to-end learning with deep neural networks.
The text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. The presentation relies on consistent notations for a wide range of models. As such, many new connections between latent structure learning strategies are revealed, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.