CS 6480 Computer Visualization Techniques

Fall 1999

INSTRUCTION TRACE VISUALIZATION

Final Report -- December 8, 1999

Bill Leahy

Introduction

One of the visualization techniques used for both analysis and teaching of certain aspects of computer operation is plotting memory address accessed versus time. Such a technique illustrates concepts such as temporal and spatial locality as well as making certain events such as operating system kernel calls more visible.

Production of such traces involves capturing sequences of addresses from an actual processor in operation or via a processor simulation. Normally the programs run for such exercises are well known performance benchmark programs.

While taking CS 6760 Parallel Computer Architecture, I was given three such address sequences. This involved projects modeling cache systems which is another application for such address data.

I was interested in attempting to visualize these datasets in a way that would be interesting as well as visually appealing. My hope would be to produce either static illustrations or perhaps animations which would be useful to students trying to visualize typical processor behavior.

Problem

The three datasets are from benchmark programs representing compilation, matrix manipulation and text processing. They are designated: cc1, spice and tex1. Each dataset is of the order of 1 million addresses along with a code designating whether or not the address represents an instruction fetch, a data read or a data write. The address space is 32 bits thus we have the potential of trying to plot x values (time) ranging from 1 to 1 million and y values ranging from 1 to approximately 4 billion. This appeared to be difficult especially since we would desire the ability to zoom in and examine local detail.

Design

In order to try and compact the image in some fashion a different model of address space was imagined. Instead of a linear scale from address 0 to address FFFF FFFF what if memory space was imagined to be a square matrix. The dimensions would then be roughly 65,000 by 65,000. This would mean that we could map the low order 16 bits of the address to the x axis, the high order 16 bits to the y axis and plot time (actually sequence number) along the z axis. We initially envisioned positioning some object (perhaps a sphere?) at each point and using perhaps color to indicate the type of address (fetch, read or write).

Implementation

We decided to use Iris Explorer as the primary visualization program. We assumed that it would be well suited to the SGI machines we were to use and it seemed to be less difficult to use in the class demonstrations.

Iris Explorer (just Explorer from now on) is one example of a class of software packages using what is known as the Dataflow paradigm. In this design, modules are positioned on a map and connected in a graph-like structure which controls the processing of data from input to eventual rendering. Typical modules might include color mapping, isosurfaces, and annotation all leading up to a rendering module which actually produces the visible image on screen.

Since the datafiles were in hexadecimal notation, we assumed that we would need to write a fairly simple preprocessing program to get the data into a format readable by Explorer and then use Explorer for the bulk of the processing.

In conversations with Markus Deshon, we learned that typically the data input phase is often the most difficult part of the entire process. Normally this involves using an auxilary package known as DataScribe that allows conversion of typical ASCII data such as that which might be produced by finite element analysis programs.

To make a long story short we spent quite some time and even with the assistance of Mr. Deshon we concluded that the DataScribe package was more trouble than it was worth. As an alternative, it is possible to use Explorer modules to read data in directly provided the data is precisely formatted in a specified arrangement. Since we were forced to preprocess the data anyway, this seemed to be a significantly easier task.

As noted earlier the initial concept was to render the data with some object such as a colored sphere representing the data point. We quickly learned that we would encounter problems with as few as 3,000 points. With some adjustment in parameters this number could be raised to perhaps 10,000 points but the degradation of interactivity was so severe that we abandoned this approach. Instead we elected to simply use a coloring technique, mapping the type of address to a color. This means that a sequnce consisting of a fetch followed by a data read would be represented by a line from one point to the other which would be the fetch color on one end and that would smoothly blend to the data read color on the other end. We found the visual effect quite pleasing and elected to remain with this technique.

There were some other minor problems. Our dataset essentially resembles following a particle moving through three space as a function of time. Even though explorer correctly "understands" the data and can in fact draw a bounding box when we selected the axis module it would only draw one axis. We developed an auxilary dataset as a work around but this clearly illustrates the drawback of this type package over a totally user programmed solution. If the users analysis requirements veer outside the expected possible abilities offered by the package solutions become limited and possibly painful: writing custom modules in c or in the extreme case moving back to square one.

As we began to produce visualizations, we determined that results were more "readable" if we distorted the z (time) axis by a factor of ten. We also felt that a reasonable amount of data to examine at one time was in the order of 100,000 to perhaps 200,000 instructions.

Results

We show an overview of the first 100,000 instruction of the cc1 (gcc) dataset. The initial impression may be that of a very chaotic structure. As we begin to examine it more closely we begin to see some interesting aspects.

Figure 1: First 100,000 memory accesses of benchmark program cc1 (gcc). Accesses run from background to foreground. Address 0 is at lower left. Address FFFFFFFF is at upper right. Thus, the stack will be located on the top in high address space.

The top surface of the data represents high order address thus it is the location of various stacks. The triangular aspect of the structure suggests numerous sequences of fetch an instruction, access the stack, access a memory location.

Figure 2: Closer look at cc1. Lines connecting two accesses are color coded with the color at each end representing the type of instruction per the key.


As we look closer we can see typical jagged structures indicative of typical loop calculations. We also realize that although data is located in widely separated locations in memory once in those locations we see the clustering which allows caching to be used so effectively.

Figure 3: Still closer to cc1. The magenta structures along the lower edge are sequences of instructions being fetched.


Figure 4: Even closer where we can begin to see the detail of sequences of instructions.


We also can take a look at the other benchmarks to discover that in fact so programs exhibit extremely structured performance.

Figure 5: Benchmark spice showing first 100,000 memory addresses. Note that spice shows even more structure tahn cc1. This indicates excellent cache performance is probable.


Figure 6: Benchmark tex1. Text processing.


A possible client for this work is Professor Ken Mackenzie who teaches undergraduate and graduate courses in computer architecture. We demonstrated these visualization to him asking if they would be useful in demonstrating concepts. He replied in the affirmative and made several interesting suggestions: It would be interesting to produce visualizations of subsets of the data: Just fetches or just reads or perhaps eliminate stack accesses, etc. He also suggested that animations such as mpeg files of fly-throughs the data might also be very interesting.

We made further modification to the preprocessing program to allow this type of selection and will eventually produce animations illustrating various subsets.

Figure 7: Just showing instruction fetches. Color is wrong.


Figure 8: Closer view of fetch instructions.


Figure 9: Extreme closeup of fetch instructions. At this magnification we can see the individual address references.


Figure 10: Illustrating data reads and writes with stack accesses suppressed. Here we see clearly that program data storage requirements are relatively limited


Figure 11: Closer view of reads and writes.


Figure 12: Zooming even closer to reads and writes.


Figure 13: Overall view of stack accesses. Wide areas probably indicate multiple stacks.


Figure 14: Closeup of stack accesses showing single and multiple stacks.


Summary

This visualization work appears to a be successful in giving the viewer some insight into what happens during execution of a computer program with respect to memory accesses.

The use of Iris Explorer made a lot of very powerful features available with no effort (i.e. the Render Module user interface is very sophisticated).

I've requested that Markus Deshon get the NT version of Iris Explorer. The Georgia Tech site license would allow us to use this software on any NT machine on campus without additional cost. One of the most interesting and appealing parts of this work is the ability to move about and explore the dataset.

Neither still shots or even mpegs animations will recapture completely the experience of interactive positioning of the viewpoint.

Future Work

Professor Mackenzie has offered to use simulation programs in his possession to produce additional data sets to be visualized.

We will also be producing various mpeg animations for use in classroom demos.