Many large-scale optimization problems rely on graph theoretic
solutions; yet high-performance computing has traditionally focused
on regular applications with high degrees of locality. We describe
our novel methodology for designing and implementing irregular parallel
algorithms that attain significant performance on high-end computer
systems. Our results for several fundamental graph theory problems are
the first ever to achieve parallel speedups. Specifically, we have demonstrated
for the first time that significant parallel speedups are attainable
for arbitrary instances of a variety of graph problems and are developing
a library of fundamental routines for discrete optimization (especially in
computational biology) on shared-memory systems.
Phylogenies derived from gene order data may prove crucial in answering
some fundamental questions in biomolecular evolution. High-performance
algorithm engineering offers a battery of tools that can reduce,
sometimes spectacularly, the running time of existing approaches.
We discuss one such such application, GRAPPA, that demonstrated over
a billion-fold speedup in running time (on a variety of real and simulated
datasets), by combining low-level algorithmic improvements, cache-aware
programming, careful performance tuning, and massive parallelism. We
show how these techniques are directly applicable to a large variety of
problems in computational biology.