Appendix - A Shell Script to Create Relational Tables


Introduction

This note describes make-sql-cmds, a shell script that creates a series of tables in a relational database. The tables contain data describing the code for version 2.7b of the Mosaic World-Wide Web browser. This script is not specific to Mosaic; it may be used with any code for which the proper data files exist.

The script accepts one of the four arguments "globals", "symbols", "calls", or "procs". Each argument produces one table: the globals table contains information about global variable definitions; the symbols table contains information about every symbol appearing in the code; the calls table contains information about each procedure call appearing in the code; and the procs table contains information abut the procedure definitions appearing in the code.

make-sql-cmds does not use the source-code files to generate the tables; instead, it uses the data files created for Sun's source-code browsing tool. The browsing data files are generated by giving the -xsb option to Sun's C compiler; this creates a binary-encoded file containing source-code information. Because the binary format is hard to deal with, each binary file is run through the sbdump command, which produces an ASCII text version of the binary file; it is these text files that make-sql-cmds uses to generate the tables.

make-sql-cmds produces as output (to std-out) a series of SQL commands; these commands, when fed to a relational database, create the source-code tables. make-sql-cmds does no source code analysis itself; it merely produces the tables used for source-code analysis.

Implementation

As stated in the introduction, make-sql-cmds can generate four tables; the choice of tables is specified by a command-line argument captured in the shell variable what. The actual table generation is delegated to a shell function containing an AWK script.
<generate the table>= (U->)

case $what in
  globals) make_global_defs
           ;;

  calls)   make_call_info
           ;;

  symbols) make_symbol_refs
           ;;

  procs)   make_proc_refs
           ;;

  *)       echo \"$what\" is an unknown choice.  Choices are \"globals\", \
                \"symbols\", \"calls\", or \"procs\". 1>&2 
           exit 1
           ;;
esac 

Creating the Who-Calls-Whom Table

The make_call_info() shell function generates a table of who-calls-whom information.
<make_call_info() shell function>= (U->)

function make_call_info {
  gawk '
    <create the call-info table>
    <get the file and directory names>
    <get the calling information>
    '
  }

make_call_info() first outputs the SQL command to create the table that will hold the procedure call information. The table is called call_info and has the following fields:

<create the call-info table>= (<-U U->)

BEGIN {
  table_name = "call_info"
  printf "create table %s (", table_name
  printf "dir varchar(80) not null,"
  printf "file varchar(40) not null,"
  printf "caller varchar(40) not null,"
  printf "calling varchar(40) not null,"
  printf "lno int unsigned not null"
  printf ");\n"
  }

The name of the file and containing directory are given in the Source name section of the sbdump file. The next line after the section header is the full path name of the file. This line gets split into pieces delimited by the directory separator /, and the pieces are reassembled into file, the file name, and dir the full path leading to file.

<get the file and directory names>= (<-U U-> U-> U->)

/.... Source name section/ {
  getline
  i = split($2, names, "/")
  file = names[i]
  dir = ""
  for (j = 2; j < i; j++) dir = dir "/" names[j]
  }

The calling information in an sbdump file is contained in lines that begin with a number and the words Call From. The forth field is the caller's name, the sixth field is the called procedure's name, and the eight field is the line number at which the call occurred. If a procedure is called from within a macro, the associated line number is negative; in that case it's negated to make it positive. Strings in SQL are delimited by single quotes (they may also be delimited by double quotes). Including an actual single quote in the code would screw up shell quoting (ending it prematurely); using \047, the octal value of a single quote, avoids the problem.

<get the calling information>= (<-U)

/[0-9]*: Call From:/ {
  if ($8 < 0) $8 = -$8
  printf "insert into %s values (", table_name
  printf "\047%s\047, ", dir
  printf "\047%s\047, ", file
  printf "\047%s\047, ", $4
  printf "\047%s\047, ", $6
  printf "%d", $8
  printf ");\n"
  }

Creating the Global Definitions Table

The make_global_defs() shell function generates a table of information on global variable definitions.
<make_global_defs() shell function>= (U->)

function make_global_defs {
  gawk '
    <create the call-info table>
    <get the file and directory names>
    <get the global definition information>
    '
  }

make_global_defs() first outputs the SQL command to create the table that will hold the procedure call information. The table is called global_defs and has the following fields:

<create the global-defs table>=

BEGIN {
  table_name = "global_defs"
  printf "create table %s (", table_name
  printf "dir varchar(80) not null,"
  printf "file varchar(40) not null,"
  printf "name varchar(40) not null,"
  printf "lno int unsigned not null"
  printf ");\n"
  }

The global-definition information in an sbdump file is contained in lines that begin with a number and the words Symbol on.

<get the global definition information>= (<-U)

/[0-9]*: Symbol on/ {
   i = split($7, parts, "_")
   if ((i >= 5) && (parts[3] == "def") && (parts[4] == "var") &&
       (parts[5] == "global")) {
     printf "insert into %s values (", table_name
     printf "\047%s\047, ", dir
     printf "\047%s\047, ", file
     printf "\047%s\047, ", substr($6, 2, length($6) - 2)
     printf "%d", substr($5, 1, length($5) - 1)
     printf ");\n"
     }
   }

Creating the Symbol References Table

The make_symbol_defs() shell function generates a table of information on symbol references.
<make_symbol_refs() shell function>= (U->)

function make_symbol_refs {
  gawk '
    <create the symbol-reference table>
    <get the file and directory names>
    <get the symbol reference information>
    <report error lines>
    '
  }

First output the SQL command to create the table that will hold the symbol reference information. The table is called symbols and has the following fields:

<create the symbol-reference table>= (<-U)

BEGIN {
  relname = "symbols"
  printf "create table %s (", relname
  printf "dir varchar(80) not null, "
  printf "file varchar(40) not null, "
  printf "lno int unsigned not null, "
  printf "name varchar(40) not null, "
  printf "type char(40) not null"
  printf ");\n"
  }

The symbol-reference information in an sbdump file is contained in lines that begin with a number and the words Symbol on.

<get the symbol reference information>= (<-U)

/[0-9]*: Symbol on/ {
   type = $(NF - 2)
   if (!match(type, "^cb_")) {
     loc = dir "/" file
     if (length(loc) > 55)
       loc = "[...]" substr(loc, length(loc) - 50)
     printf "Bad symbol line from %s:\n  \"%s\"\n", loc, $0 > "/dev/stderr"
     }
   else {
     symbol = $0
     sub("^[^\047]*\047", "", symbol)
     sub("\047[^\047]*$", "", symbol)
     if (match(symbol, "\047")) bad_lines++
     else {
       printf "insert into %s values (", relname
       printf "\047%s\047, ", dir
       printf "\047%s\047, ", file
       printf "%d, ", substr($5, 1, length($5) - 1)
       printf "\047%s\047, ", symbol
       gsub("_", " ", type)
       printf "\047%s\047", type
       printf ");\n"
       }
     }
   }

If any symbol-reference lines were dropped because of quoting problems, print a message to std-err indicating so.

<report error lines>= (<-U)

END {
   if (bad_lines > 0)
     printf "Dropped %d lines with symbols containing single quotes.\n",
       bad_lines > "/dev/stderr"
   }

Creating the Procedure References Table

The make_proc_refs() shell function generates a table of information on symbol references.
<make_proc_refs() shell function>= (U->)

function make_proc_refs {
  gawk '
    <create the procedure-reference table>
    <get the file and directory names>
    <get the procedure reference information>
    '
  }

First output the SQL command to create the table that will hold the procedure reference information. The table is called proc_refs and has the following fields:

<create the procedure-reference table>= (<-U)

BEGIN {
  relname = "proc_refs"
  printf "create table %s (", relname
  printf "dir varchar(80) not null, "
  printf "file varchar(40) not null, "
  printf "proc varchar(40) not null, "
  printf "start int unsigned not null, "
  printf "end int unsigned not null"
  printf ");\n"
  }

The procedure-reference information in an sbdump file is contained in lines that begin with a number and the words Function name.

<get the procedure reference information>= (<-U)

/[0-9]*: Function name:/ {
   printf "insert into %s values (", relname
   printf "\047%s\047, ", dir
   printf "\047%s\047, ", file
   printf "\047%s\047, ", substr($4, 1, length($4) - 1)
   if (split($7, lnos, "[{},]") != 4) {
     print "Line-number parsing error." > "/dev/stderr"
     exit
     }
   printf "%s, %s);\n", lnos[2], lnos[3]
   }

<make-sql-cmds>=

#!/bin/ksh

<shell boilerplate>

<make_call_info() shell function>
<make_global_defs() shell function>
<make_symbol_refs() shell function>
<make_proc_refs() shell function>

<process the command-line options>

<generate the file data> |
<generate the table> |
<clean the data>

There must one command-line argument giving the type of table to be generated; Any other arguments are interpreted as the names of sbdump files.

<process the command-line options>= (<-U)

[ $# -lt 1 ] && badcmd

files=''
what=''
while [ $# -gt 0 ]
do case $1 in
   globals|symbols|procs|calls)
        [ "$what" ] && badcmd
        what=$1
        ;;

   *)   files="$files $1"
        ;;

   esac
   shift 1
done

[ ! "$what" ] && badcmd

If a set of sbdump files were given on the command line, extract data from only those files; otherwise go through the whole subdirectory and extract data from any sbdump file found.

<generate the file data>= (<-U)

cd /net/projects/groups/morale/src/mosaic-2.7b

if [ "$files" ]
then for f in $files ; do cat $f ; done
else for f in `find . -name '*bdd' -print` ; do cat $f ; done
fi 

The full paths for files are large, which leads to larger databases and crampled and mal-formatted output. Stripping off the maximal common prefix for full path names makes things better.

<clean the data>= (<-U)

sed "s;$PWD/;;g" 

The usual.

<shell boilerplate>= (<-U)

pgmname=`basename $0`

function oops {

  # Print $1 to std-err and die.

  echo 1>&2 "$1."
  exit 1
  }


function badcmd {

  # Print a bad command message and die.

  echo 1>&2 "Command format is"
  oops "  \"$pgmname [fname]... globals | symbols | procs | calls\""
  }