Graph
theoretic techniques are used in variety of important
computational problems in the areas of computational
physics, mechanics, and fluid flow. We present a new,
parallel algorithm for detecting cycles in partitioned,
directed graphs that is both scalable in the graph and
machine size, and performs well in practice. As an example,
on a $p = 64$ processor cluster, we have solved an extremely
large and difficult input problem with $n = 2^{28}$ vertices
in less than five minutes. Our parallel algorithm uses
a new graph representation, called Packed-Intervals,
has a theoretical running time for this input of $\tau
\log p + \bigO{\frac{n}{p}} \sigma + \bigO{\frac{n}{p}}$
for $n \geq p^{4}$, and achieves good speedup for any
$n \gg p$ in practice. Our study includes both an efficient
parallel algorithm and an experimental study.