Foundations and Trends® in Machine Learning >
Vol 10 > Issue 3-4

Prateek Jain and Purushottam Kar (2017), "Non-convex Optimization for Machine Learning", Foundations and Trends® in Machine Learning: Vol. 10: No. 3-4, pp 142-336. http://dx.doi.org/10.1561/2200000058

© 2017 P. Jain and P. Kar

Optimization, Robustness, Statistical learning theory, Sparse representations, Computational learning, Design and analysis of algorithms

Download article
**In this article:**

Preface

Mathematical Notation

Part I: Introduction and Basic Tools

1. Introduction

2. Mathematical Tools

Part II: Non-convex Optimization Primitives

3. Non-Convex Projected Gradient Descent

4. Alternating Minimization

5. The EM Algorithm

6. Stochastic Optimization Techniques

Part III: Applications

7. Sparse Recovery

8. Low-rank Matrix Recovery

9. Robust Linear Regression

10. Phase Retrieval

References

A vast majority of machine learning algorithms train their models and perform inference by solving optimization problems. In order to capture the learning and prediction problems accurately, structural constraints such as sparsity or low rank are frequently imposed or else the objective itself is designed to be a non-convex function. This is especially true of algorithms that operate in high-dimensional spaces or that train non-linear models such as tensor models and deep networks. The freedom to express the learning problem as a non-convex optimization problem gives immense modeling power to the algorithm designer, but often such problems are NP-hard to solve. A popular workaround to this has been to relax non-convex problems to convex ones and use traditional methods to solve the (convex) relaxed optimization problems. However this approach may be lossy and nevertheless presents significant challenges for large scale optimization. On the other hand, direct approaches to non-convex optimization have met with resounding success in several domains and remain the methods of choice for the practitioner, as they frequently outperform relaxation-based techniques – popular heuristics include projected gradient descent and alternating minimization. However, these are often poorly understood in terms of their convergence and other properties. This monograph presents a selection of recent advances that bridge a long-standing gap in our understanding of these heuristics. We hope that an insight into the inner workings of these methods will allow the reader to appreciate the unique marriage of task structure and generative models that allow these heuristic techniques to (provably) succeed. The monograph will lead the reader through several widely used nonconvex optimization techniques, as well as applications thereof. The goal of this monograph is to both, introduce the rich literature in this area, as well as equip the reader with the tools and techniques needed to analyze these simple procedures for non-convex problems.

Preface

Mathematical Notation

Part I: Introduction and Basic Tools

1. Introduction

2. Mathematical Tools

Part II: Non-convex Optimization Primitives

3. Non-Convex Projected Gradient Descent

4. Alternating Minimization

5. The EM Algorithm

6. Stochastic Optimization Techniques

Part III: Applications

7. Sparse Recovery

8. Low-rank Matrix Recovery

9. Robust Linear Regression

10. Phase Retrieval

References

*Non-convex Optimization for Machine Learning* takes an in-depth look at the basics of non-convex optimization with applications to machine learning. It introduces the rich literature in this area, as well as equips the reader with the tools and techniques needed to apply and analyze simple but powerful procedures for non-convex problems.

*Non-convex Optimization for Machine Learning* is as self-contained as possible while not losing focus of the main topic of non-convex optimization techniques. The monograph initiates the discussion with entire chapters devoted to presenting a tutorial-like treatment of basic concepts in convex analysis and optimization, as well as their non-convex counterparts. The monograph concludes with a look at four interesting applications in the areas of machine learning and signal processing, and exploring how the non-convex optimization techniques introduced earlier can be used to solve these problems. The monograph also contains, for each of the topics discussed, exercises and figures designed to engage the reader, as well as extensive bibliographic notes pointing towards classical works and recent advances.

*Non-convex Optimization for Machine Learning* can be used for a semester-length course on the basics of non-convex optimization with applications to machine learning. On the other hand, it is also possible to cherry pick individual portions, such the chapter on sparse recovery, or the EM algorithm, for inclusion in a broader course. Several courses such as those in machine learning, optimization, and signal processing may benefit from the inclusion of such topics.