This
paper will discuss high performance clustering from a
series of critical topics: architectural design, system
software infrastructure, and programming environment.
This will be accomplished through an overview of a large
scale, high performance SuperCluster (named Roadrunner)
in production at The University of New Mexico (UNM) Albuquerque
High Performance Computing Center (AHPCC). This SuperCluster,
sponsored by the U.S. National Science Foundation (NSF)
and the National Computational Science Alliance (NCSA),
is based almost entirely on freely-available, vendor-independent
software. For example, its operating system (Linux),
job scheduler (PBS), compilers (GNU/EGCS), and parallel
programming libraries (MPI). The Globus toolkit, also
available for this platform, allows high performance
distributed computing applications to use geographically
distributed resources such as this SuperCluster. In addition
to describing the design and analysis of the Roadrunner
SuperCluster, we provide experimental analyses from grand
challenge applications and future directions for SuperClusters.