Main People Publications Research Tools

Aristotle Analysis System

Java Architecture for Bytecode Analysis (JABA)

Tarantula Fault Localization System

 

Aristotle Users Manual



Back

Contents






1. Introduction

Aristotle provides a simple menu interface that lets you run Aristotle program analysis tools and view analysis results. This manual describes how to use the menu interface. The next section describes setup tasks that you must complete in order to use the interface. Subsequent sections discuss the interface in general, describe particular menus and the tasks you can accomplish using them, and describe the formats in which Aristotle presents analysis information.

2. Initial Setup

Before you can use Aristotle's menu interface, you must perform a few setup tasks. If you do not understand the following tasks, seek help from someone familiar with Unix.

  1. Create a directory to serve as your "Aristotle database directory". Aristotle stores analysis information in this directory. Reserve this database directory for use by Aristotle, but do not run Aristotle from this directory.
  2. Find out the name of the directory in which Aristotle is installed on your system. We'll refer to that directory as the "Aristotle system directory". Edit your ".cshrc" or ".login" file (or whatever defines your path on your system) and add <aristotle-system-directory>/bin to your path, where <aristotle-system-directory> is the full pathname of your Aristotle system directory.
  3. Edit your ".cshrc" file (or your system's equivalent) by adding the following lines, and modifying the portion of the third line that appears in angle brackets. The third line defines an environment variable used by several Aristotle tools. The third word in that line must give the full pathname of your database directory (with no trailing slash).
         # Specify the location of the Aristotle database directory
         # (the third word should name your database directory).
         setenv ARISTOTLE_DB_DIR <your-database-directory>
    
  4. Log out and in again, or "source" the .cshrc file, to cause your changes to take effect.

You can now invoke Aristotle's menu interface. See the next section for instructions.

3. General Information about the Aristotle Menu Interface

This section overviews the Aristotle menu interface. It tells you how to invoke and exit the interface and how to use the menus, and describes general characteristics of the menus.

Invoking and exiting the menu interface

To invoke the Aristotle menu interface, type:

     aristotle <source-file>

where <source-file> is an optional argument that names a source file in your current directory that you wish to work with, including its extension (e.g. ".c"). The Aristotle Top Menu is displayed.

You can exit the menu interface from any menu by typing "x" and pressing the <Return> key.

The Aristotle menus are designed to present a simple interface for the user. Their usage should be self-evident and on-screen help is available.

Menu default settings. Most default toggle and option values displayed by Aristotle menus are configurable using the Aristotle Setup facility available on the Aristotle Top Menu. If you change any of the defaults you will have to save your configuration, if you would like these settings to persist on subsequent Aristotle sessions. You should save the setup file in your home directory so that Aristotle can find it each time.

Task Dependencies. Many analysis tasks performed by Aristotle depend upon outputs provided by other analysis tasks. For example, to view a control dependence graph for a program P, you must first build that graph using Aristotle's control dependence graph builder. Before you may use the control dependence graph builder, however, you must first run P through the Aristotle parser/analyzer.

The Aristotle menus are aware of task dependencies. When you attempt to perform a specific task using the Aristotle menu interface, the interface determines whether prerequisite data exists, and if not, invokes analysis tools necessary to produce that data. Thus, for example, when you attempt to view the control dependence graph for P, Aristotle automatically runs the parser/analyzer and control dependence graph builders, if necessary. We refer to this automatic invocation of tasks as "autotasking".

Autotasking may not always be convenient. For example, if you view a control dependence graph for program P, and then alter P, and try to view the control dependence graph for the alterred version, Aristotle will detect the presence of the control dependence information for P in the database and not infer that the program needs to be reanalyzed. To force reanalysis, you can remove prior analysis information for P from your database (using Analysis database management) or use Setup to request Aristotle to ignore previous analysis data.

4. Restrictions and Limitations

Work is ongoing to increase Aristotle's robustness and range of functionality. The Parser/Analyzer may generate warnings, but they can generally be ignored. At this time, we are aware of the following restrictions and limitations on Aristotle's correct operation.

  • At present, Aristotle operates only on single source files. To run the Aristotle parser/analyzer on a source file, you must be in the same directory as that file.
  • The Aristotle parser/analyzer does not analyze programs that contain parse errors. When you run the parser/analyzer from the Aristotle menu system, the system first attempts to compile your source program using GCC. If this compilation fails, the system displays an error message and does not proceed further. You may use Setup facilities to turn this compilation-check off.
  • The Aristotle parser/analyzer does not identify all embedded function calls. Only the deepest embedded function calls are reported. For example, in the statement, "a( a( i, c(j) ), a( j, k ) );" only the c(j) call and the a( j, k ) call are reported.
  • The Aristotle parser/analyzer treats statements that contain the C "?" operator as single statements, rather than conditional statements.


5. Interpreting Table Output

Aristotle stores analysis results in your database directory. You can view these results from the menus using Aristotle's table viewers. You can also view graphs from the menus using Aristotle's graph viewers. The following sections may help you interpret the analysis results that are presented using these two mechanisms.

Table Contents Menu

For purposes of illustration, here is a simple table containing control flow information for a function, example.c, that contains two subroutines, main and get:

     Control flow information for file: example.c

     Function: main
     CFG root node: 1

      node  label   endline    cf-successors (node-id(edge-id,"edge-label")
     ----------------------------------------------------------------------  
         1      E         6    3(1,"") 
         2      X        14    
         3      D         7    4(2,"") 
         4      F         8    5(3,"") 
         5      C         9    6(4,"") 
         6      P         0    7(5,"T") 8(6,"F") 
         7      S        10    9(7,"") 
         8      S        12    9(8,"") 
         9      J         0    2(9,"") 


     Function: get
     CFG root node: 1

      node  label   endline    cf-successors (node-id(edge-id,"edge-label")
     ---------------------------------------------------------------------- 
         1      E        19    3(1,"") 
         2      X        24    
         3      D        20    4(2,"") 
         4      F        21    5(3,"") 
         5      S        22    2(4,"") 

Although each table that you can view using the table viewers contains different information, there are a few characteristics common to most tables:

  1. Most tables report information on a per-function basis. That is, for each function in your source file, the table lists a block of information for that function. In the example, the table gives a block of information for function main, followed by a block of information for function get.
  2. Most tables contain (at least) columns labelled "node", "label" and "endline". The table in the example contains these three columns. These columns have the following meaning:
    • node - an integer representing a unique (within the procedure) node identifier
    • label - a node type designation, chosen from the following:
                 E - Entry
                 X - Exit
                 F - Function Call
                 T - Return from Function Call
                 C - Conditional
                 P - Predicate
                 J - Join 
                 S - Simple Statement
                 B - Break
                 G - Goto
                 L - Label
                 W - Switch
                 R - Region	
                 D - Declaration
      
    • endline - the line number of the source code to which the node corresponds. For multiple-line statements, endline gives the location of the first line on which the statement appears. (The name "endline" is used for historical reasons.) Some nodes, such as region nodes, do not correspond to source lines; for these nodes endline is 0.

You can view the following tables using Aristotle.

    Table type 1 - cfg (control flow graph information).
    The table given as an example at the top of this section is a cfg information table. The cfg information table contains the three general columns described above, and a fourth column called "cf-successors". The cf-successors column lists, by node id, the control flow successors of the node named in the left hand column. In other words, there is an edge in the control flow graph from the node in the left-hand column to each "cf-successor node". The column also lists, for each successor, the unique edge-identifier for that edge, and the label attached to that edge (possibly an empty label).
    Table type 2 - cdg (control dependence graph information).
    This table contains the three general columns mentioned above, along with a column called "cd-successors". The cd-successors column lists, by node id, the control dependence successors of the node named in the left hand column. In other words, there is an edge in the control dependence graph from the node in the left-hand column to each cd-successor node. Some cdg edges have labels attached to them; these labels are reported inside parentheses, following the node id in the cd-successors column.
    Table type 3 - ddg (data dependence graph information).
    This table contains the three general columns mentioned above, and a column called "dd-successors". The data dependence successors column is divided into two parts: a list of flow dependence successors and a list of output dependence successors. The column lists, by node id, the nodes that are data dependent on the node in the left hand column, along with (in parentheses) the name of the variable that causes this dependence. A similar list denotes output dependencies.
    Table type 4 - pdg (program dependence graph information).
    This table contains the three general columns mentioned above, and a column called "successor-nodes" There are three types of successor-nodes: control dependence (CD), data dependence (DD), and control flow (CF) successors. These successors are listed the same way as in the tables for the control flow graph, control dependence graph, and data dependence graph.
    Table type 5 - map (source-code-to-node mapping information).
    This table contains the three general columns mentioned above, and columns called "type" "subtype" "strtline" "strtrtl" and "endrtl". The "type" field corresponds to the single-letter label that appears in column two, that is, the label is a single letter distillation of the type. The subtype field further classifies the node type. Subtypes are chosen from the following:
         RETURN_STATEMENT         DO_WHILE_CONDITION
         EXIT_STATEMENT           FOR_CONDITION	
         BREAK_STATEMENT          ENTRY_REGION
         CONTINUE_STATEMENT       IF_CLAUSE_REGION
         GOTO_STATEMENT           ELSE_CLAUSE_REGION
         OTHER_STATEMENT          LOOP_TRUE_REGION	
         IF_PREDICATE             LOOP_FALSE_REGION
         WHILE_PREDICATE          WHILE_HEADER_REGION 
         DO_WHILE_PREDICATE       DO_WHILE_HEADER_REGION
         FOR_PREDICATE            FOR_HEADER_REGION
         SWITCH_PREDICATE         CASE_REGION
         IF_CONDITION             SUMMARY_REGION
         WHILE_CONDITION          DONT_CARE
    
    The strtline column, in the future, may hold the location of the first line on which a statement appears, which for multi-line statements may differ from the location of the last line. Due to details of the implementation of the parser, from which we determine these line numbers, this field is currently set to 0. The strtrtl and endrtl columns contain information which is for the internal use of the Aristotle system only. For nodes not associated with code (e.g. region nodes) these fields have value 0.
    Table type 6 - sym (symbol table information).
    At the beginning of this table, the global identifiers that appear in the file are listed. For each identifier, a negative id number is assigned and appears in the first column. The second column indicates whether the identifier is a pointer, the third column indicates whether it is a static variable, and the fourth column indicates whether it is externally declared. The last column lists the name of the variable.

    After listing global identifiers, this table reports information that is local to each function in the source file. For each function, the table lists the function name, and any formal parameters used in the function. The table then lists the non-global identifiers used in the function. The sub-table containing information about non-global parameters has six columns. The first column lists the variable id, the second column says whether the variable is a LOCAL variable or PARAMETER, the third column indicates whether the variable is a pointer, the fourth column indicates whether the variable is static, and the fifth column indicates whether the variable is external. The last column lists the name of the variable.

    Next, for the given function, the table lists call sites within the function. For each call site, the table lists the node number of the call site, the last line of the call statement, the procedure called, the node number of the corresponding return node, and the parameters used in the call.

    The last part of the symbol table information for each function lists exit site information. For each exit site, the table lists the node number, the node label (X for the Exit node, or S for Return or Exit statements), and the last line of the exit statement.

    Table type 7 - du (local definition-use information).
    This table contains the three general columns described above, and a column labeled "identifiers defined/used (and their occurence-types)". This column lists, by name, the variables that are defined and used in the statement corresponding to the node named in the left-hand column. In addition to identifying the variables defined and used at a statement, this table also notes the types of uses and definitions that occurred in that statement. In the following explication, suppose x, p, q, and r are scalar, pointer, pointer- pointer variables, and, pointer- pointer- pointer variables respectively; that is, suppose a function func contains the following declarations:
        int x;
        int *p;
        int **q;
        int ***r;
    
    Suppose further that we are considering the occurences of uses and/or definitions in statement s in func. Aristotle distinguishes the following types of definition and use occurences.
      Scalar variable use (svu).
      If s uses the value of x, then s has a scalar variable use of x. For example, the statement "*p = x" has an svu of x.
      Scalar variable assignment (sva).
      If s sets the value of x, then s has a scalar variable assignment of x. For example, the statement "x = *p" has an sva of x.
      Indirect variable use (ivu).
      If s uses the value of *p (the memory location pointed to by p), then s has an indirect variable use of p. For example, the statement "x = *p" has an ivu of p. IVU is used to represent an arbitrary number of indirections. So, "x = **q" and "x = ***r" are also an indirect variable use.
      Indirect variable assignment (iva).
      If s sets the value of *p (the memory location pointed to by p), then s has an indirect variable assignment of p. For example, the statement "*p = x" has an iva of p. IVA is used to represent an arbitrary number of indirections. So, "**q = x" and "***r = x" are also an indirect variable assignment.
      Pointer variable use (pvu).
      Whenever s includes a pointer dereference of p or pp, s has a pointer variable use of p or pp. For example, the statement "*p = x", in addition to having an iva of p, also has a pvu of p.
      Pointer variable assignment (pva).
      Whenever s includes an assignment to p itself, or to pp itself (rather than to the memory location that the variable refers to), s contains a pointer variable assigment of p or pp. For example, the statement "p = &x" has a pva of p.
      Address use (au).
      Whenever s takes an address of a variable, s contains an address use of that variable. For example, the statement "p = &x", has an au of x.

    For each variable used or defined at a node, that variable's occurence types are listed, following the variable, in parentheses.

    Table type 8 - rd (reaching definitions information).
    This table lists the three general columns described above, with a fourth column called "reaching definitions" that lists the variable definitions that reach the node listed in the first column. The reaching definitions column lists the node at which the reaching definition was defined, followed by a colon and the name of the variable whose definition reaches.
    Table type 9 - src (preprocessed source file with line numbers).
    This table lists the source code from which graphs are built, along with line numbers.
    Table type 10 - dom ((post)dominator trees and (post)dominator information).
    This table contains the three general columns described above, and a column labeled "dom info", that lists information on dominance and postdominance relationships in the control flow graph for the source file. For each node in the control flow graph for a source file, eight types of (post)dominance information are listed:
      DOMINATES:
      A list of the cfg nodes that the node in the left-hand column dominates.
      DOMINATED BY:
      A list of the cfg nodes that the node in the left-hand column is dominated by.
      DTREE PARENT:
      The parent, in the dominator tree for the cfg, of the node in the left-hand column.
      DTREE CHILDREN:
      A list of the children, in the dominator tree for the cfg, of the node in the left-hand column.
      P-DOMINATES:
      A list of the cfg nodes that the node in the left-hand column postdominates.
      P-DOMINATED BY:
      A list of the cfg nodes that the node in the left-hand column is postdominated by.
      PDTREE PARENT:
      The parent, in the postdominator tree for the cfg, of the node in the left-hand column.
      PDTREE CHILDREN:
      A list of the children, in the postdominator tree for the cfg, of the node in the left-hand column.
    Table type 11 - cg (call graph).
    This table reports call graph information for a source file, in five columns. A "node id" column assigns a unique integer node identifier to each node in the call graph (where each node represents a procedure.) The "procedure" column lists the name of the procedure associated with that node. The "defined?" column contains either "YES" or "NO"; the former indicates that the procedure corresponding to the node is present in the source file, the latter indicates that it is external to (its source is not given in) the source file. Note that such nodes are necessarily leaves in the call graph, because we cannot determine the procedures that their associated procedure calls. The "calls procedure(s)" column lists, one per line, the names of the procedures that are called by the procedure associated with the node. The "node id" column gives the associated node number of such called procedures. When the procedure contains no calls, this is indicated.


Graph Viewer Output

Aristotle uses the XVCG tool to display graphs. XVCG is an X11 software package for Visualization of Compiler Graphs. The Aristotle distribution includes an XVCG binary for Solaris in the bin subdirectory. Since the bin directory should be in your path for running Aristotle, XVCG should be able to run without any further setup.

XVCG is distributed under the terms of the GNU General Public License and is available by anonymous ftp at ftp.cs.uni-sb.de (134.96.254.254) in the directory /pub/graphics/vcg. There is a web site in Germany at the University of Saarland ( http://www.cs.uni-sb.de:80/RW/users/sander/html/gsvcg1.html). More information, including source code and documentation, is available on the website. Refer to XVCG documentation for usage details.

6. Definitions and Terms Used

This section contains a short list of the following program analysis terms which are frequently encountered. The topics are:

6.1. Call Graph (CG)

A call graph models the calling relationships between entities in a program and is useful for flow-insensitive analyses and program understanding. Source files may contain calls to external routines, including standard library routines that cannot themselves be analyzed for calling relationships. In such cases, the call graph includes nodes for the called routines but marks them as unexpanded.


6.2. Control Dependence Graph (CDG)

A control dependence graph summarizes the control conditions necessary for a statement to execute. It also identifies those statements that are guaranteed to execute if a given statement has executed. See Figure 1 below.

A control dependence graph contains several types of nodes:

  • Statement nodes - represent simple statements, are shown as ellipses in the figure.
  • Predicate nodes - from which labeled edges originate, are shown as rectangles in the figure.

Figure 1. Control Dependence Graph

    S1.   read i
    S2.   sum = 0
    S3.   done = 0
    P4.   while i <= 5 and !done do
    S5.      read j
    P6.      if j >= 0 then
    S7.         sum = sum + j
    P8.         if sum > 100 then
    S9.            done = 1
                else
    S10.           i = i + 1
             endif
          endwhile
    S11.  print sum
In this illustration, S5, P6, and P4 are control dependent on P4 being true.





6.3. Control Flow Graph (CFG)

A control flow graph is a directed graph in which each node represents a basic block and each edge represents the flow of control between basic blocks. A basic block is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halt or possibility of branching except at the end. See Figure 2 below.

In a control flow graph, if there is an edge from node X to node Y, we say Y is a successor of X and that X is a predecessor of Y.

Figure 2. Control Flow Graph





S1. read i S2. sum = 0 S3. done = 0 P4. while i <= 5 and !done do S5. read j P6. if j >= 0 then S7. sum = sum + j P8. if sum > 100 then S9. done = 1 else S10. i = i + 1 endif endwhile S11. print sum





6.4. Dataflow Analysis and Reaching Definitions (DFA and RD)

Dataflow analysis gathers information about the definitions and uses of variables in a program. A definition is a variable access in which a variable's value is changed. A use is a variable access in which a variable's value is fetched without being changed. In Figure 3 below, statements S1 and S10 contain definitions of i, and statements S2 and S7 contain definitions of sum. Similarly, statements P4 and S10 contain uses of i and statements P6 and S7 contain uses of j.

By performing Dataflow Analysis in Aristotle, one can obtain reaching definitions information. The reaching definitions table lists each node along with the variable definitions that reach that node. The reaching definitions column of the table lists the node at which the reaching definition was defined, followed by a colon and the name of the variable whose definition reaches.

Figure 3. Data Flow Analysis - Program Segment Illustration

    S1.   read i
    S2.   sum = 0
    S3.   done = 0
    P4.   while i <= 5 and !done do
    S5.      read j
    P6.      if j >= 0 then
    S7.         sum = sum + j
    P8.         if sum > 100 then
    S9.            done = 1
                else
    S10.           i = i + 1
             endif
          endwhile
    S11.  print sum
6.5. Data Dependence Graph (DDG)

A data dependence graph contains nodes that represent program statements and edges that represent data dependencies between statements. See figure 4 below.

A flow dependence exists between statements S1 and S2 if S1 defines variable v and S2 uses v and there is a path in the program from S1 to S2 on which v is not redefined, that is, the definition of v in S1 reaches the use of v in S2. If statement S2 is flow dependent on statement S1, then (S1, S2) is a definition-use pair.

Figure 4. Data Dependence Graph

    S1.   read i
    S2.   sum = 0
    S3.   done = 0
    P4.   while i <= 5 and !done do
    S5.      read j
    P6.      if j >= 0 then
    S7.         sum = sum + j
    P8.         if sum > 100 then
    S9.            done = 1
                else
    S10.           i = i + 1
             endif
          endwhile
    S11.  print sum
In this illustration, because the definition of i in S1 reaches the use of i in P4, (S1, P4) is a definition-use pair for i.





6.6. Dominator Tree (DT)

A dominator tree describes the dominance relationship between nodes in a control flow graph (statements in a program).

Given nodes X and Y in a control flow graph, node X dominates node Y if and only if every directed path from entry to Y (not including Y) contains X.

A dominator tree has the following characteristics:

  • the initial node is the entry node
  • each node dominates only its descendents in the tree


6.7. Local Definitions and Uses (DU)

A Local Definitions and Uses table identifies the variables that are defined and used in a statement corresponding to a given node.

The Local Definitions and Uses table also identifies the following types of definition and use occurrences:

  • Scalar variable use (svu).
  • Scalar variable assignment (sva)
  • Indirect variable use (ivu)
  • Indirect variable assignment (iva)
  • Pointer variable use (pvu)
  • Pointer variable assignment (pva)
  • Address use (au)


6.8. Postdominator Tree (PDT)

A postdominator tree describes the postdominance relationship between nodes in a control flow graph (statements in a program). See Figure 5 below.

Given nodes X and Y in a control flow graph, node X is post-dominated by node Y if and only if every directed path from X to exit (not including X) contains Y.

A postdominator tree has the following characteristics:

  • the initial node is the exit node
  • each node postdominates only its descendents in the tree

Figure 5. Postdominator Tree





1. read (n) 2. i = 1 3. sum = 0 4. while i <= n do 5. sum = 0 6. j = 1 7 while j <= i do 8. sum = sum + j 9. j = j + 1 endwhile 10 write (sum, i) 11. i = i + 1 endwhile 12. write (sum, i)





6.9. Preprocessed Source File with Line Numbers (SRC)

This is a source file listing with the line numbers displayed at the left of each program statement. Provided for reference purposes.

6.10. Program Dependence Graph (PDG)

A program dependence graph represents both control dependencies and data dependencies for a program. A PDG consists of a control dependence graph (called a control dependence subgraph when part of a PDG), to which data dependence edges (which, together with the PDG nodes, form a data dependence subgraph) have been added.




6.11. Source Code To Node Mapping (MAP)

A Source Code To Node Mapping is a cross-reference between the nodes and the source line numbers assigned to each line of the source code.

The type of node - entry, exit, declaration, predicate, etc. - is also displayed.





6.12. Symbol Table (SYM)

A symbol table provides the following information for a given source file:
  • global identifiers
  • function names
  • formal parameters
  • non-global identifiers
This table shows if an identifier is a pointer, a static variable, and whether the variable is externally declared.

Georgia Tech | College of Computing | Software Engineering | Aristotle Home
Updated November 14, 2005 by Jim Jones