Implementing parallel graph algorithms in large,
shared memory machines, such as the Cray XMT, can be
challenging for programmers. Synchronization, deadlock, hotspotting,
and others can be barriers to obtaining linear scalability.
Alternative programming models, such as the bulk synchronous
parallel programming model used in Google’s Pregel, have been
proposed for large graph analytics. This model eases programmer
complexity by eliminating deadlock and simplifying data sharing.
We investigate the algorithmic effects of the BSP model for
graph algorithms and compare performance and scalability with
hand-tuned, open source software using GraphCT. We analyze
the innermost iterations of these algorithms to understand the
differences in scalability between BSP and shared memory
algorithms. We show that scalable performance can be obtained
with graph algorithms in the BSP model on the Cray XMT. These
algorithms perform within a factor of 10 of hand-tuned C code.