Foundations and Trends® in Machine Learning >
Vol 3 > Issue 2

Michael W. Mahoney (2011), "Randomized Algorithms for Matrices and Data", Foundations and TrendsĀ® in Machine Learning: Vol. 3: No. 2, pp 123-224. http://dx.doi.org/10.1561/2200000035

© 2011 M. W. Mahoney

Download article
**In this article:**

1 Introduction

2 Matrices in Large-scale Scientific Data Analysis

3 Randomization Applied to Matrix Problems

4 Randomized Algorithms for Least-squares Approximation

5 Randomized Algorithms for Low-rank Matrix Approximation

6 Empirical Observations

7 A Few General Thoughts, and A Few Lessons Learned

8 Conclusion

Acknowledgments

References

Randomized algorithms for very large matrix problems have received a great deal of attention in recent years. Much of this work was motivated by problems in large-scale data analysis, largely since matrices are popular structures with which to model data drawn from a wide range of application domains, and this work was performed by individuals from many different research communities. While the most obvious benefit of randomization is that it can lead to faster algorithms, either in worst-case asymptotic theory and/or numerical implementation, there are numerous other benefits that are at least as important. For example, the use of randomization can lead to simpler algorithms that are easier to analyze or reason about when applied in counterintuitive settings; it can lead to algorithms with more interpretable output, which is of interest in applications where analyst time rather than just computational time is of interest; it can lead implicitly to regularization and more robust output; and randomized algorithms can often be organized to exploit modern computational architectures better than classical numerical methods.

1: Introduction

2: Matrices in large-scale scientific data analysis

3: Randomization applied to matrix problems

4: Randomized algorithms for least-squares approximation

5: Randomized algorithms for low-rank matrix approximation

6: Empirical observations

7: A few general thoughts, and a few lessons learned

8: Conclusion

Acknowledgements

References

Randomized algorithms for very large matrix problems have received a great deal of attention in recent years. Much of this work was motivated by problems in large-scale data analysis, largely since matrices are popular structures with which to model data drawn from a wide range of application domains, and the success of this line of work opens the possibility of performing matrix-based computations with truly massive data sets. Originating within theoretical computer science, this work was subsequently extended and applied in important ways by researchers from numerical linear algebra, statistics, applied mathematics, data analysis, and machine learning, as well as domain scientists. Randomized Algorithms for Matrices and Data provides a detailed overview, appropriate for both students and researchers from all of these areas, of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis. By focusing on ubiquitous and fundamental problems such as least-squares approximation and low-rank matrix approximation that have been at the center of recent developments, an emphasis is placed on a few simple core ideas that underlie not only recent theoretical advances but also the usefulness of these algorithmic tools in large-scale data applications.