Abstract

Developing applications on distributed memory parallel computers is notoriously hard, and requires considerable effort by the programmer in obtaining an efficient implementation. Unfortunately, programs developed on a target parallel computer can not easily be ported to another parallel computer. High Performance Fortran (HPF) was proposed as a step towards facilitating portable parallel programming. Mapping of data and computation to processors is specified using ALIGN and DISTRIBUTE directives. The advantage of this approach is that a compiler can be made to generate a message passing program. The problem of finding optimal data and computation distribution is nontrivial. Hence there is a need for automating this task. An effective parallel implementation must address these three issues: parallelism, locality of reference, and load balance. In my thesis research, we assume input programs with explicit parallelism. We propose compiler techniques for improving locality of reference and load balance. Locality of reference can be improved by aligning arrays and iterations (of loop nests). Processor idle time can be minimized by distributing computation equally among processors. Even on a shared memory machine, obtaining an efficient implementation requires optimizing for locality.

Alignments specify a mapping of array elements and iterations to points in a template (a multidimensional grid). HPF has a restricted form of alignment that allows affine mappings between axes. Other researchers have developed techniques for finding affine mappings from data and computation spaces to template space. For programs that have triangular iteration and data space or loops that require mirror elements with respect to a hyperplane, these techniques result in serializing a parallel dimension. We present a new technique that folds spaces and finds optimal affine alignments between the folded spaces. Often it is not possible to find alignments that eliminate all the nonlocal references. In such cases, data need to be realigned. We are developing an algorithm for global optimization that reduces overhead due to realignment and references to nonlocal data.