There are more obstacles in collecting data with concurrent programs: producing a snapshot of program state is more complex in a concurrent program; memory may be distributed, and messages in transit may be difficult to access; system clocks may not be synchronized across the multiple processors involved in the computation, and may drift at different rates.
Graphical visualization can be a powerful tool for understanding and explaining complex tasks such as parallel computation. Appropriate displays of concurrent programs can help the viewer develop intuition about performance and correctness problems that stem from unanticipated interactions between processes. In the following sections, we discuss the various inherent taks involved in creating a visualization of a parallel program: data collection, analysis, and display.
Instrumentation can be done at various levels: hardware, operating system, run-time environment, application program, etc. With each of these levels of instrumentation, one can obtain various levels of information: at the hardware level, one can obtain process CPU times, program counter samples, cache misses, etc; at the operating system level, one can obtain information such as messages sent and received, process creation, scheduling, etc.; run-time environment instrumentaion can provide information such as state of various run-time queues, the acquisition and release of locks, procedure calls and returns, etc.; application level instrumentation can provide information about abstract, high-level, and user-defined events.
Data collection for concurrent systems is more complicated than for sequential systems. Concurrent programs tend to be long running and produce large amounts of data. It is hard to determine globally consistent states as the data is distributed across separate memories.
On-line versus post-mortem visualization: On-line viz. can provide up-to-the-moment view of the computation's progress and can reduce the overhead of storage. However, the viz. cannot be too detailed. Post-mortem viz. provides the opportunity for a more detailed display than can proceed at a user-specified pace, and generally will perturb the program to a lesser extent. Filtering and reduction of data might have to be used to reduce the storage overhead.
Collection of data from concurrent programs involves multiple streams of data and these streams must be appropriately ordered and merged for analysis and visualization. Lamport defined a consistent ordering of events in a distributed system in terms of the happened-before relationship.
IPS-2 is an example of a system which creats intermediate data structures such as procedure call, synchronization, and other flow graphs, as part of the analysis process. Pablo is an example of a system where the analysis phase is user-directed; user can manipulate a set of performance data transformation modules that the user can manipulate and interconnect graphically. EBBA is a high-level debugging tool that allows the user to specify models of program behavior consisting of abstract and primitive events. Clustering and filtering techniques are used to examine the stream of primitive events from the program, and applies a pattern-matching algorithm to construct user-specified abstract events. TraceViewer is a tool for detecting non-determinism.
One form of analysis peculiar to parallel systems is the determination of the order of events. At the simplest level, this might involve assignment of timestamps to events. Lamport's logical time ordering is convenient in obtaining a consistent casual ordering.
Gthreads package provides an animated program call graph view, which is dynamically constructed as threads are forked, functions are called, and the point of execution of a thread moves from function to function. Conch provides a message passing view, where processes are arranged around the outside of the circle, and messages are represented as colored disks that move into the center of the ring when sent and out to the receiving process upon receipt. The AIMS system presents the network topology view of the system under study. Many debuggers show a running display of communications over time in their time-process diagrams. A number of performance evaluation systems provide displays of statistical information.
Above examples illustrate generic displays rather than application-specific displays. Application specific displays require a little more effort on the part of the developer. Voyeur, PARADISE, BALSA, Tango, Pavane, and POLKA are some of the examples.