Foundations and Trends® in Signal Processing > Vol 19 > Issue 2

Discrete Latent Structure in Neural Networks

By Vlad Niculae, University of Amsterdam, Netherlands | Caio Corro, Université de Rennes, France | Nikita Nangia, Amazon, USA | Tsvetomila Mihaylova, Aalto University, Finland | André F. T. Martins, Instituto Superior Técnico, Portugal and Instituto de Telecomunicações, Portugal and Unbabel, Portugal

 
Suggested Citation
Vlad Niculae, Caio Corro, Nikita Nangia, Tsvetomila Mihaylova and André F. T. Martins (2025), "Discrete Latent Structure in Neural Networks", Foundations and Trends® in Signal Processing: Vol. 19: No. 2, pp 99-211. http://dx.doi.org/10.1561/2000000134

Publication Date: 02 Jun 2025
© 2025 V. Niculae et al.
 
Subjects
Pattern recognition and learning,  Learning and statistical methods,  Speech and spoken language processing,  Statistical/Machine learning,  Classification and prediction,  Deep learning
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. Structure Prediction Background
3. Continuous Relaxations
4. Surrogate Gradients
5. Probabilistic Latent Variables
6. Conclusions
Acknowledgements
References

Abstract

Many types of data from fields including natural language processing, computer vision, and bioinformatics are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions. However, effective training is challenging as neural networks are typically designed for continuous computation.

This text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. Our presentation relies on consistent notations for a wide range of models. As such, we reveal many new connections between latent structure learning strategies, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.

DOI:10.1561/2000000134
ISBN: 978-1-63828-570-0
126 pp. $85.00
Buy book (pb)
 
ISBN: 978-1-63828-571-7
126 pp. $160.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. Structure Prediction Background
3. Continuous Relaxations
4. Surrogate Gradients
5. Probabilistic Latent Variables
6. Conclusions
Acknowledgements
References

Discrete Latent Structure in Neural Networks

Machine learning (ML) is often employed to build predictive models for analyzing rich data, such as images, text, or sound. Most such data is governed by underlying structured representations, such as segmentations, hierarchy, or graph structure. It is common for practical ML systems to be structured as pipelines, including off-the-shelf components that produce structured representations of the input, used as features in subsequent steps of the pipeline. On the one hand, such architectures require availability of these components, or of the data to train them. Since the component may not be built with the downstream goal in mind, a disadvantage of pipelines is that they are prone to error propagation. On the other hand, they are transparent: the predicted structures can be directly inspected and used to interpret downstream predictions. In contrast, deep neural networks rival and even outperform pipelines by learning dense, continuous representations of the data, solely driven by the downstream objective.

This monograph is about neural network models that induce discrete latent structure, combining the strengths of both end-to-end and pipeline systems. In doing so, not one specific downstream application in natural language processing nor computer vision is assumed, however the presentation follows an abstract framework that allows to focus on technical aspects related to end-to-end learning with deep neural networks.

The text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. The presentation relies on consistent notations for a wide range of models. As such, many new connections between latent structure learning strategies are revealed, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.

 
SIG-134