Shang-Hua Teng (2016), "Scalable Algorithms for Data and Network Analysis", Foundations and Trends® in Theoretical Computer Science: Vol. 12: No. 1–2, pp 1-274. http://dx.doi.org/10.1561/0400000051

© 2016 S.-H. Teng

Algorithmic game theory, Computational aspects of combinatorics and graph theory, Computational complexity, Computational geometry, Computational Models and Complexity, Data structures, Design and analysis of algorithms, Operations Research, Parallel algorithms, Randomness in Computation, Data Mining, Economics of information and the Web, Scalability, Social Networking, Spectral methods, Robustness, Optimization, Markov chain Monte Carlo, Graphical models, Game theoretic learning, Dimensionality reduction, Data mining, Clustering, Web search, Natural language processing for IR

Download article
**In this article:**

Preface

1. Scalable Algorithms

2. Networks and Data

3. Significant Nodes: Sampling - Making Data Smaller

4. Clustering: Local Exploration of Networks

5. Partitioning: Geometric Techniques for Data Analysis

6. Sparsification: Sparsification - Making Networks Simpler

7. Electrical Flows: Laplacian Paradigm for Network Analysis

8. Remarks and Discussions

Acknowledgements

References

In the age of Big Data, efficient algorithms are now in higher demand more than ever before. While Big Data takes us into the asymptotic world envisioned by our pioneers, it also challenges the classical notion of efficient algorithms: Algorithms that used to be considered efficient, according to polynomial-time characterization, may no longer be adequate for solving today’s problems. It is not just desirable, but essential, that efficient algorithms should be scalable. In other words, their complexity should be nearly linear or sub-linear with respect to the problem size. Thus, scalability, not just polynomial-time computability, should be elevated as the central complexity notion for characterizing efficient computation. In this tutorial, I will survey a family of algorithmic techniques for the design of provably-good scalable algorithms. These techniques include local network exploration, advanced sampling, sparsification, and geometric partitioning. They also include spectral graph-theoretical methods, such as those used for computing electrical flows and sampling from Gaussian Markov random fields. These methods exemplify the fusion of combinatorial, numerical, and statistical thinking in network analysis. I will illustrate the use of these techniques by a few basic problems that are fundamental in network analysis, particularly for the identification of significant nodes and coherent clusters/communities in social and information networks. I also take this opportunity to discuss some frameworks beyond graph-theoretical models for studying conceptual questions to understand multifaceted network data that arise in social influence, network dynamics, and Internet economics.

Preface

1. Scalable Algorithms

2. Networks and Data

3. Significant Nodes: Sampling - Making Data Smaller

4. Clustering: Local Exploration of Networks

5. Partitioning: Geometric Techniques for Data Analysis

6. Sparsification: Sparsification - Making Networks Simpler

7. Electrical Flows: Laplacian Paradigm for Network Analysis

8. Remarks and Discussions

Acknowledgements

References

In the age of Big Data, efficient algorithms are in higher demand more than ever before. While Big Data takes us into the asymptotic world envisioned by our pioneers, it also challenges the classical notion of efficient algorithms: Algorithms that used to be considered efficient, according to polynomial-time characterization, may no longer be adequate for solving today's problems. It is not just desirable but essential that efficient algorithms should be scalable. In other words, their complexity should be nearly linear or sub-linear with respect to the problem size. Thus, scalability, not just polynomial-time computability, should be elevated as the central complexity notion for characterizing efficient computation.

*Scalable Algorithms for Data and Network Analysis* surveys a family of algorithmic techniques for the design of scalable algorithms. These techniques include local network exploration, advanced sampling,
sparsification, and geometric partitioning. They also include spectral graph-theoretical methods, such as are used for computing electrical flows and sampling from Gaussian Markov random fields. These methods exemplify the
fusion of combinatorial, numerical, and statistical thinking in network analysis. *Scalable Algorithms for Data and Network Analysis* illustrates the use of these techniques by a few basic problems that are fundamental in
analyzing network data, particularly for the identification of significant nodes and coherent clusters/communities in social and information networks. It also discusses some frameworks beyond graph-theoretical models for
studying conceptual questions that arise in network analysis and social influences.