Abstract
Developing applications on distributed memory parallel computers is
notoriously hard, and requires considerable effort by the programmer
in obtaining an efficient implementation. Unfortunately, programs
developed on a target parallel computer can not easily be ported to
another parallel computer. High Performance Fortran (HPF) was proposed
as a step towards facilitating portable parallel programming. Mapping
of data and computation to processors is specified using ALIGN and
DISTRIBUTE directives. The advantage of this approach is that a
compiler can be made to generate a message passing program. The
problem of finding optimal data and computation distribution is
nontrivial. Hence there is a need for automating this task. An
effective parallel implementation must address these three issues:
parallelism, locality of reference, and load balance. In my
thesis research, we assume input programs with explicit
parallelism. We propose compiler techniques for improving locality of
reference and load balance. Locality of reference can be improved by
aligning arrays and iterations (of loop nests). Processor idle time
can be minimized by distributing computation equally among processors.
Even on a shared memory machine, obtaining an efficient implementation
requires optimizing for locality.
Alignments specify a mapping of array elements and iterations to
points in a template (a multidimensional grid). HPF has a restricted
form of alignment that allows affine mappings between axes. Other
researchers have developed techniques for finding affine mappings from
data and computation spaces to template space. For programs that have
triangular iteration and data space or loops that require mirror
elements with respect to a hyperplane, these techniques result in
serializing a parallel dimension. We present a new technique that
folds spaces and finds optimal affine alignments between the folded
spaces. Often it is not possible to find alignments that eliminate all
the nonlocal references. In such cases, data need to be realigned. We
are developing an algorithm for global optimization that reduces
overhead due to realignment and references to nonlocal data.