Foundations and Trends® in Theoretical Computer Science > Vol 12 > Issue 1–2

Scalable Algorithms for Data and Network Analysis

Shang-Hua Teng, University of Southern California, Los Angeles, USA, shanghua@usc.edu
 
Suggested Citation
Shang-Hua Teng (2016), "Scalable Algorithms for Data and Network Analysis", Foundations and Trends® in Theoretical Computer Science: Vol. 12: No. 1–2, pp 1-274. http://dx.doi.org/10.1561/0400000051

Published: 30 May 2016
© 2016 S.-H. Teng
 
Subjects
Algorithmic game theory,  Computational aspects of combinatorics and graph theory,  Computational complexity,  Computational geometry,  Computational Models and Complexity,  Data structures,  Design and analysis of algorithms,  Operations Research,  Parallel algorithms,  Randomness in Computation,  Data Mining,  Economics of information and the Web,  Scalability,  Social Networking,  Spectral methods,  Robustness,  Optimization,  Markov chain Monte Carlo,  Graphical models,  Game theoretic learning,  Dimensionality reduction,  Data mining,  Clustering,  Web search,  Natural language processing for IR
 

Free Preview:

Article Help

Share

Download article
In this article:
Preface
1. Scalable Algorithms
2. Networks and Data
3. Significant Nodes: Sampling - Making Data Smaller
4. Clustering: Local Exploration of Networks
5. Partitioning: Geometric Techniques for Data Analysis
6. Sparsification: Sparsification - Making Networks Simpler
7. Electrical Flows: Laplacian Paradigm for Network Analysis
8. Remarks and Discussions
Acknowledgements
References

Abstract

In the age of Big Data, efficient algorithms are now in higher demand more than ever before. While Big Data takes us into the asymptotic world envisioned by our pioneers, it also challenges the classical notion of efficient algorithms: Algorithms that used to be considered efficient, according to polynomial-time characterization, may no longer be adequate for solving today’s problems. It is not just desirable, but essential, that efficient algorithms should be scalable. In other words, their complexity should be nearly linear or sub-linear with respect to the problem size. Thus, scalability, not just polynomial-time computability, should be elevated as the central complexity notion for characterizing efficient computation. In this tutorial, I will survey a family of algorithmic techniques for the design of provably-good scalable algorithms. These techniques include local network exploration, advanced sampling, sparsification, and geometric partitioning. They also include spectral graph-theoretical methods, such as those used for computing electrical flows and sampling from Gaussian Markov random fields. These methods exemplify the fusion of combinatorial, numerical, and statistical thinking in network analysis. I will illustrate the use of these techniques by a few basic problems that are fundamental in network analysis, particularly for the identification of significant nodes and coherent clusters/communities in social and information networks. I also take this opportunity to discuss some frameworks beyond graph-theoretical models for studying conceptual questions to understand multifaceted network data that arise in social influence, network dynamics, and Internet economics.

DOI:10.1561/0400000051
ISBN: 978-1-68083-130-6
290 pp. $99.00
Buy book
 
ISBN: 978-1-68083-131-3
290 pp. $260.00
Buy E-book
Table of contents:
Preface
1. Scalable Algorithms
2. Networks and Data
3. Significant Nodes: Sampling - Making Data Smaller
4. Clustering: Local Exploration of Networks
5. Partitioning: Geometric Techniques for Data Analysis
6. Sparsification: Sparsification - Making Networks Simpler
7. Electrical Flows: Laplacian Paradigm for Network Analysis
8. Remarks and Discussions
Acknowledgements
References

Scalable Algorithms for Data and Network Analysis

In the age of Big Data, efficient algorithms are in higher demand more than ever before. While Big Data takes us into the asymptotic world envisioned by our pioneers, it also challenges the classical notion of efficient algorithms: Algorithms that used to be considered efficient, according to polynomial-time characterization, may no longer be adequate for solving today's problems. It is not just desirable but essential that efficient algorithms should be scalable. In other words, their complexity should be nearly linear or sub-linear with respect to the problem size. Thus, scalability, not just polynomial-time computability, should be elevated as the central complexity notion for characterizing efficient computation.

Scalable Algorithms for Data and Network Analysis surveys a family of algorithmic techniques for the design of scalable algorithms. These techniques include local network exploration, advanced sampling, sparsification, and geometric partitioning. They also include spectral graph-theoretical methods, such as are used for computing electrical flows and sampling from Gaussian Markov random fields. These methods exemplify the fusion of combinatorial, numerical, and statistical thinking in network analysis. Scalable Algorithms for Data and Network Analysis illustrates the use of these techniques by a few basic problems that are fundamental in analyzing network data, particularly for the identification of significant nodes and coherent clusters/communities in social and information networks. It also discusses some frameworks beyond graph-theoretical models for studying conceptual questions that arise in network analysis and social influences.

 
TCS-051