Visualizing Concurrent Programs

Presented by

What to Draw? When to Draw? An Essay on Parallel Program Visualization
Barton P. Miller

Understanding parallel programs is hard, and parallel program visualization often proves to be even more difficult. Many visualization systems provide visually/aesthetically appealing views, but are often less useful. In his paper, Miller presents a number of guidelines for alleviating these problems, and making visualization an effective tool for program understanding and presentation.

Why is visualization difficult?

It is often easier to draw a pretty picture than a useful one. A comment we often hear is: ``that looks really neat, what does it mean?'' We must strive to make visualizations more intuitive and easy to comprehend, rather than spending a lot of effort on making complex but aesthetically appealing visualizations. Creators of program visualizations are faced with two major problems: Good models for programming constructs are hard to create, mainly because we cannot appeal to the existence of real, physical models and concepts, but must rather derive meaningful models from artificial constructs. Invariably, mental models differ from person to person, and mappings from abstract to concrete that are intuitive to one person, may have little or no meaning to others.

Comment: This is a common problem that the user interface community has thoroughly investigated. Concepts such as ``consistency,'' ``affordances,'' and ``metaphors'' seem applicable in software visualization as well.

Miller fails to point out why visualization of parallel programs is innately a harder task than designing visualizations for sequential programs. To remedy this, I here provide a list of reasons of my own (note that this list is far from comprehensive):

Criteria for making good visualization

Miller lists the following guidelines for visualization design: Question: The title of the paper implies that the criteria for making good visualizations are chiefly concerned with parallel programs. However, few of these criteria are specific to parallel programs, and do equally apply to serial programs. What concerns arise when visualizing parallel programs that are not inherent in serial program visualizations? Also, what sequential programming visualization concepts are applicable in parallel program visualization?


Visualizing the Performance of Parallel Programs
Michael T. Heath and Jennifer A. Etheridge

In this paper, the authors describe the ``ParaGraph'' system--a tool for displaying performance data gathered from executions of message passing parallel programs. ParaGraph allows the user to visualize a large number of important features ranging from load balancing and processor utilization, interprocess communication performance and process/message graphs, to task decomposition. For off-line collection of real-time performance data, ParaGraph relies on the Portable Instrumented Communication Library (PICL), which facilitates generation of trace files without requiring the programmer to annotate the source code. The software was developed by undergraduate students at Oak Ridge National Laboratory.

ParaGraph design goals

The three primary design goals of ParaGraph are summarized below:

Overview

One of the key features of ParaGraph is its (or rather PICL's) ability to collect real-time data, and later allow the user to examine this data in depth from several different perspectives. This is accomplished by telling PICL to buffer all important execution events in memory, which are then written to a trace file. In this way, profiling overhead is minimized so that serious perturbations in the actual execution can be avoided. In addition to keeping the overhead small, PICL goes to great lengths to ensure that the time stamped events are synchronized across processors. The trace files are read by ParaGraph, and the user is given the option to replay the execution off-line, thereby ensuring that the order of events is the same each time the execution is played back. The execution can be played back continuously, allowing the user to pause and resume the animation, or the user may single step through the execution one event at a time.

Comment: The lack of speed control for the animation, as well as long response times when pausing the animation make ParaGraph cumbersome to use at times. In addition, there is no provision for reverse execution which could easily have been incorporated.

ParaGraph features nearly 30 different performance views, and allows the user to link in customized views. The views are invoked from a set of menus that are organized as follows:

Note that many of the views in ParaGraph are redundant and display the same data from different perspectives. This allows the user to select a view that most appropriately displays a certain characteristic of the execution.

Comment: While ParaGraph supports animation to some extent (whether its dynamic views are ``animated'' is even debatable), it by no means supports algorithm animation, contrary to the authors' claim. The trace files simply contain no semantic information, and from watching the views, one cannot infer what the algorithm associated with the performance data is.

Question: It is not immediately clear whether ParaGraph visualizes performance data for each processor as opposed to each process. Certain views as well as comments made in the paper suggest the former, however. How could ParaGraph be extended to support both? It is certainly conceivable to run a large number of message passing processes on a uniprocessor machine, and the portable communication protocols of today (e.g. MPI and PVM) allow process distribution decisions to be made rather arbitrarily, as well as independent of the underlying architecture. A good visualizer should handle arbitrary process mappings, as well as varying process granularities (i.e. both light and heavy weight processes).