David A. Bader
IEEE Fellow
AAAS Fellow
Professor
College of Computing
Georgia Tech
Atlanta, GA 30332


 
 

 

Parallel Community Detection for Massive Graphs

Tackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with a massively parallel algorithm for community detection that scales to current data sizes, clustering a real-world graph of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT, and of over 100 million vertices and over 3 billion edges in under 500 seconds on a four-processor Intel E7-8870-based server. Our algorithm achieves moderate parallel scalability without sacrificing sequential operational complexity. Community detection partitions a graph into subgraphs more densely connected within the subgraph than to the rest of the graph. We take an agglomerative approach similar to Clauset, Newman, and Moore's sequential algorithm, merging pairs of connected intermediate subgraphs to optimize different graph properties. Working in parallel opens new approaches to high performance.

Publication History

Versions of this paper appeared as:
  1. E.J. Riedy, H. Meyerhenke, D. Ediger, and D.A. Bader, ``Parallel Community Detection for Massive Graphs,'' The 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Torun, Poland, September 11-14, 2011. Lecture Notes in Computer Science, 7203:286-296, 2012.
  2. E.J. Riedy, D. Ediger, D.A. Bader, and H. Meyerhencke, ``Parallel Community Detection for Massive Graphs,'' 10th DIMACS Implementation Challenge -- Graph Partitioning and Graph Clustering, Atlanta, GA, February 13-14, 2012.

Download this report in Adobe PDF


 
 

Last updated: July 8, 2012

 




Computational Biology



Parallel Computing



Combinatorics