Tackling the current volume of graph-structured data requires
parallel tools. We extend our work on analyzing such massive graph
data with a massively parallel algorithm for community detection that
scales to current data sizes, clustering a real-world graph of over 122
million vertices and nearly 2 billion edges in under 7300 seconds on a
massively multithreaded Cray XMT, and of over 100
million vertices and over 3 billion edges in under 500 seconds on a four-processor
Intel E7-8870-based server. Our algorithm achieves moderate
parallel scalability without sacrificing sequential operational complexity.
Community detection partitions a graph into subgraphs more densely
connected within the subgraph than to the rest of the graph. We take
an agglomerative approach similar to Clauset, Newman, and Moore's
sequential algorithm, merging pairs of connected intermediate subgraphs
to optimize different graph properties. Working in parallel opens new
approaches to high performance.