David A. Bader
IEEE Fellow
AAAS Fellow
College of Computing
Georgia Tech
Atlanta, GA 30332



Parallel Community Detection for Massive Graphs

Tackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with a massively parallel algorithm for community detection that scales to current data sizes, clustering a real-world graph of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT, and of over 100 million vertices and over 3 billion edges in under 500 seconds on a four-processor Intel E7-8870-based server. Our algorithm achieves moderate parallel scalability without sacrificing sequential operational complexity. Community detection partitions a graph into subgraphs more densely connected within the subgraph than to the rest of the graph. We take an agglomerative approach similar to Clauset, Newman, and Moore's sequential algorithm, merging pairs of connected intermediate subgraphs to optimize different graph properties. Working in parallel opens new approaches to high performance.

Publication History

Versions of this paper appeared as:
  1. E.J. Riedy, H. Meyerhenke, D. Ediger, and D.A. Bader, ``Parallel Community Detection for Massive Graphs,'' The 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Torun, Poland, September 11-14, 2011. Lecture Notes in Computer Science, 7203:286-296, 2012.
  2. E.J. Riedy, D. Ediger, D.A. Bader, and H. Meyerhencke, ``Parallel Community Detection for Massive Graphs,'' 10th DIMACS Implementation Challenge -- Graph Partitioning and Graph Clustering, Atlanta, GA, February 13-14, 2012.

Download this report in Adobe PDF


Last updated: July 8, 2012


Computational Biology

Parallel Computing