David A. Bader
IEEE Fellow
AAAS Fellow
Professor
College of Computing
Georgia Tech
Atlanta, GA 30332


 
 

Parallel Programming for Graph Analysis

16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP)

Saturday, February 12, 2011
San Antonio, TX

An increasingly fast-paced, digital world has produced an ever-growing volume of petabyte-sized datasets. At the same time, terabytes of new, unstructured data arrive daily. As the desire to ask more detailed questions about these massive streams has grown, parallel software and hardware have only recently begun to enable complex analytics in this non-scientific space.

In this tutorial, we will discuss the open problems facing us with analyzing this "data deluge". We will present algorithms and data structures capable of analyzing spatio-temporal data at massive scale on parallel systems. We will try to understand the difficulties and bottlenecks in parallel graph algorithm design on current systems and will show how multithreaded and hybrid systems can overcome these challenges. We will demonstrate how parallel graph algorithms can be implemented on a variety of architectures using different programming models.

The goal of this tutorial is to provide a comprehensive introduction to the field of parallel graph analysis to an audience with computing background, interested in participating in research and/or commercial applications of this field. Moreover, we will cover leading-edge technical and algorithmic developments in the field and discuss open problems and potential solutions.

Topics to be Covered

  1. Network Analysis
    • Introduction to Graph Theory & Data Structures
    • Motivating Applications in Data
    • Opportunities & Challenges
    • Parallel, Multicore, & Multithreaded Architectural Support for Graph Processing
    • Mapping Graph Algorithms to Architectures
    • Open Discussion
  2. Static Parallel Algorithms
    • Programming Models
    • Parallel Prefix & List Ranking
    • Graph Search, Spanning Tree, Connected Components
    • Minimum Spanning Tree Matroid Algorithms
    • Social Networking Algorithms
    • Betweenness Centrality
    • Community Detection
    • Open Discussion
  3. Dynamic Parallel Algorithms
    • Streaming Data Analysis
    • Data Structures for Streaming Data
    • Tracking Clustering Coefficients
    • Tracking Connected Components
    • Anomaly Detection
    • Open Discussion
  4. Programming & Software
    • Programming Environments: OpenMP, MPI, MapReduce, CUDA/OpenCL, UPC, X10
    • Graph Libraries: PBGL, MTGL, LEDA, STXXL, SNAP, GraphCT, STING, Pregel
    • Advanced Topics
    • Question & Answer
    • Open Discussion

Presenters

  • David A. Bader is a Full Professor in the School of Computational Science and Engineering, College of Computing, at Georgia Institute of Technology, and Executive Director for High Performance Computing. Dr. Bader is a lead scientist in the DARPA Ubiquitous High Performance Computing (UHPC) program. He received his Ph.D. in 1996 from The University of Maryland, and his research is supported through highly-competitive research awards, primarily from NSF, NIH, DARPA, and DOE. Dr. Bader serves on the Research Advisory Council for Internet2, the Steering Committees of the IPDPS and HiPC conferences, the General Chair of IPDPS 2010 and Chair of SIAM PP12. He is an associate editor for several high impact publications including the Journal of Parallel and Distributed Computing (JPDC), ACM Journal of Experimental Algorithmics (JEA), IEEE DSOnline, Parallel Computing, and Journal of Computational Science, and has been an associate editor for the IEEE Transactions on Parallel and Distributed Systems (TPDS). Dr. Bader's interests are at the intersection of high-performance computing and real-world applications, including computational biology and genomics and massive-scale data analytics. He has co-chaired a series of meetings, the IEEE International Workshop on High-Performance Computational Biology (HiCOMB), co-organized the NSF Workshop on Petascale Computing in the Biological Sciences, written several book chapters, and co-edited special issues of the Journal of Parallel and Distributed Computing (JPDC) and IEEE TPDS on high-performance computational biology. He is also a leading expert on multicore, manycore, and multithreaded computing for data-intensive applications such as those in massive-scale graph analytics. He has co-authored over 100 articles in peer-reviewed journals and conferences, and his main areas of research are in parallel algorithms, combinatorial optimization, massive-scale social networks, and computational biology and genomics. Prof. Bader is an IEEE Fellow, a National Science Foundation CAREER Award recipient, and has received numerous industrial awards from IBM, NVIDIA, Intel, Sun Microsystems, and Microsoft Research. He served as a member of the IBM PERCS team for the DARPA High Productivity Computing Systems program, was a distinguished speaker in the IEEE Computer Society Distinguished Visitors Program, and has also served as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor.
  • David Ediger is a Ph.D. student in Electrical & Computer Engineering and a research assistant in the High Performance Computing Lab at Georgia Tech. He completed his B.S. in Computer Engineering in 2008 under the direction of Dr. Tarek El-Ghazawi at The George Washington University focusing on Unified Parallel C (UPC) and reconfigurable systems and an M.S. in Electrical & Computer Engineering from Georgia Tech in 2010. David is principal developer of GraphCT, the graph characterization package for highly parallel, massive graph analysis.
  • E. Jason Riedy is a Research Scientist II in the School of Computational Science and Engineering at Georgia Tech. He is developing a software framework for parallel analysis of massive, streaming graph data (STING). He has codes extending and included in the widely used packages LAPACK and the extended-precision BLAS (XBLAS). He has contributed to GNU Octave, GNU Emacs, GNU R packages, git, and other free software. He was a member of the IEEE 754 revision committee and has presented widely on software development and research. His Ph.D. is from the University of California, Berkeley in 2010 under Dr. James Demmel on sparse linear algebra and parallel graph optimization. His dual B.S. in Computer Science and Mathematics is from the University of Florida in 1998.

Publication History

Versions of this tutorial appeared as:
  1. David A. Bader, David Ediger, E. Jason Riedy, ``Parallel Programming for Graph Analysis,'' The 16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2011), San Antonio, TX, February 12, 2011.

 
 

Last updated: September 28, 2010

 




Computational Biology



Parallel Computing



Combinatorics