Foundations and Trends® in Communications and Information Theory > Vol 21 > Issue 1-2

Universal Features for High-Dimensional Learning and Inference

By Shao-Lun Huang, Tsinghua-Berkeley Shenzhen Institute, China, shaolun.huang@sz.tsinghua.edu.cn | Anuran Makur, Purdue University, USA, amakur@purdue.edu | Gregory W. Wornell, Massachusetts Institute of Technology, USA, gww@mit.edu | Lizhong Zheng, Massachusetts Institute of Technology, USA, lizhong@mit.edu

 
Suggested Citation
Shao-Lun Huang, Anuran Makur, Gregory W. Wornell and Lizhong Zheng (2024), "Universal Features for High-Dimensional Learning and Inference", Foundations and Trends® in Communications and Information Theory: Vol. 21: No. 1-2, pp 1-299. http://dx.doi.org/10.1561/0100000107

Publication Date: 05 Feb 2024
© 2024 S-L. Huang et al.
 
Subjects
Information theory and statistics,  Information theory and computer science,  Pattern recognition and learning,  Detection and estimation,  Statistical/Machine learning,  Statistical signal processing,  Spectral methods,  Dimensionality reduction,  Classification and prediction,  Clustering
 

Free Preview:

Download extract

Share

Download article
In this article:
1. Introduction
2. The Modal Decomposition of Joint Distributions
3. Variational Characterization of the Modal Decomposition
4. Local Information Geometry
5. Universal Feature Characterizations
6. Learning Modal Decompositions
7. Collaborative Filtering and Matrix Factorization
8. Softmax Regression
9. Gaussian Distributions and Linear Features
10. Nonlinear Features and nonGaussian Distributions
11. Semi-Supervised Learning
12. Modal Decomposition of Markov Random Fields
13. Emerging Applications and Related Developments
Acknowledgements
Appendices
References

Abstract

This monograph develops unifying perspectives on the problem of identifying universal low-dimensional features from high-dimensional data for inference tasks in settings involving learning. For such problems, natural notions of universality are introduced, and a local equivalence among them is established. The analysis is naturally expressed via information geometry, which provides both conceptual and computational insights. The development reveals the complementary roles of the singular value decomposition, Hirschfeld-Gebelein-Rényi maximal correlation, the canonical correlation and principle component analyses of Hotelling and Pearson, Tishby’s information bottleneck, Wyner’s and Gács-Körner common information, Ky Fan k-norms, and Breiman and Friedman’s alternating conditional expectations algorithm. Among other uses, the framework facilitates understanding and optimizing aspects of learning systems, including multinomial logistic (softmax) regression and neural network architecture, matrix factorization methods for collaborative filtering and other applications, rank-constrained multivariate linear regression, and forms of semi-supervised learning.

DOI:10.1561/0100000107
ISBN: 978-1-63828-176-4
320 pp. $99.00
Buy book (pb)
 
ISBN: 978-1-63828-177-1
320 pp. $310.00
Buy E-book (.pdf)
Table of contents:
1. Introduction
2. The Modal Decomposition of Joint Distributions
3. Variational Characterization of the Modal Decomposition
4. Local Information Geometry
5. Universal Feature Characterizations
6. Learning Modal Decompositions
7. Collaborative Filtering and Matrix Factorization
8. Softmax Regression
9. Gaussian Distributions and Linear Features
10. Nonlinear Features and nonGaussian Distributions
11. Semi-Supervised Learning
12. Modal Decomposition of Markov Random Fields
13. Emerging Applications and Related Developments
Acknowledgements
Appendices
References

Universal Features for High-Dimensional Learning and Inference

In many contemporary and emerging applications of machine learning and statistical inference, the phenomena of interest are characterized by variables defined over large alphabets. This increasing size of both the data and the number of inferences, and the limited available training data means there is a need to understand which inference tasks can be most effectively carried out, and, in turn, what features of the data are most relevant to them.

In this monograph, the authors develop the idea of extracting “universally good” features, and establish that diverse notions of such universality lead to precisely the same features. The information-theoretic approach used results in a local information geometric analysis that facilitates their computation in a host of applications.

The authors provide a comprehensive treatment that guides the reader through the basic principles to the advanced techniques including many new results. They emphasize a development from first-principles together with common, unifying terminology and notation, and pointers to the rich embodying literature, both historical and contemporary.

Written for students and researchers, this monograph is a complete treatise on the information theoretic treatment of a recognized and current problem in machine learning and statistical inference.

 
CIT-107