This directory contains the postscript files of Linda Wills' Ph.D. dissertation. Below are ftp instructions for how to obtain them. If you have any questions, please contact Linda Wills at linda@cc.gatech.edu. ______________________________________________________________________ AUTOMATED PROGRAM RECOGNITION BY GRAPH PARSING Linda Mary Wills MIT Doctoral Dissertation, July, 1992 MIT-AI-TR 1358 Abstract The recognition of standard computational structures (cliches) in a program can help an experienced programmer understand the program. Based on the known relationships between the cliches, a hierarchical description of the program's design can be recovered. We develop and study a graph parsing approach to automating program recognition in which programs are represented as attributed dataflow graphs and a library of cliches is encoded as an attributed graph grammar. Graph parsing is used to recognize cliches in the code. This approach has been implemented in a system called GRASPR, which stands for ``GRAph-based System for Program Recognition.'' We demonstrate that this graph parsing approach is a feasible and useful way to automate program recognition. In studying this approach, we have experimented with two medium-sized, real-world simulator programs. There are three aspects of our study. First, we evaluate our representation's ability to suppress many common forms of program variation which hinder recognition. Second, we investigate the expressiveness of our graph grammar formalism for capturing programming cliches. Third, we empirically and analytically study the computational cost of our recognition approach with respect to the real-world simulator programs. CONTENTS title.ps: Title Page ack.ps: Acknowledgments toc.ps: Table of Contents ch1.ps: Chapter 1: Introduction -- An introduction and overview of the dissertation. ch2.ps: Chapter 2: The Knowldege, Program Corpus, and Recognition Examples -- A description of the cliche library and our experiences in acquiring it. The simulator programs we are experimenting with are also described. Examples are given demonstrating GRASPR's capabilities in recognizing the given cliches in the simulator programs. Chapter 3: The Flow Graph Formalism -- Definition of the attributed flow graph formalism which we use to represent programs and cliches. ch3a.ps: A. Flow Graphs and Grammars -- describes the formalism and casts program recognition as a flow graph parsing problem. ch3b.ps: B. Chart Parsing Flow Graphs -- presents a flow graph chart parsing algorithm, which provides a flexible recognition architecture. The chapter ends with a summary of related work in the general area of graph grammar formalisms. ch4.ps: Chapter 4: Applying Parsing to Recognition -- The issues that arise in applying flow graph parsing to program recognition and how GRASPR solves them. ch5.ps: Chapter 5: Capabilities and Limitations -- The capabilities and limitations of the parsing approach, considered in terms of the classes of program variations tolerated by GRASPR and the expressiveness of flow graph grammars. ch6.ps: Chapter 6: Analysis -- Empirical and analytical studies of the computational cost of our approach. ch7.ps: Chapter 7: Conclusions -- A summary of the strengths and weaknesses of the parsing approach, ideas for future work (particularly in the context of a hybrid understanding system), and a brief comparative summary of related work in program recognition. app.ps: Appendix A -- The NP-completeness of flow graph recognition. lof.ps: List of Figures bib.ps: Bibliography ______________________________________________________________________ ftp ftp.cc.gatech.edu Connected to solaria.cc.gatech.edu. 220 solaria FTP server (SunOS 4.1) ready. Name (ftp.cc.gatech.edu:linda): anonymous 331 Guest login ok, send ident as password. Password: 230 Guest login ok, access restrictions apply. ftp> binary 200 Type set to I. ftp> cd pub/groups/reverse/repository/recognition 250 CWD command successful. 884 bytes received in 0.18 seconds (4.9 Kbytes/s) ftp> get README ... ftp> quit 221 Goodbye.