Appendix - A CGI Script for Keyword Searching


This appendix describes an AWK script that serves as a CGI script for processing keyword searches through documents.

<gen-view>=

#!/bin/ksh

echo QUERY_STRING |
gawk '<define the query evaluation functions>

      BEGIN {
        FS = "&"
        print "Content-type: text/html"
        print ""
        print "<html><head></head>"

        <read the document concordance>
        }

       {<parse the browser request>

        page_set = eval(args["query"])

        if (page_set == "")
          printf "Can%ct find any pages containing \"%s\".", 39, args["query"]

        else {
          <generate the thumbnail page>
          }
        }

      END {
        print "</html>"
        }
     '

A keyword search is given as a sequence of words and the boolean operations and and or. To keep things simple the search is given in postfix format.

eval() accepts the keyword search e and splits it into words. stack is used as an argument stack for the searches; top is the index of the top of the stack.

Upon successful evaluation of the search stack[1] contains the set (represented as a space-separated string) of numbers for the pages matching the search. Leading and trailing spaces are trimmed from the set and it's returned as the result of the evaluation.

<define the query evaluation functions>= (<-U) [D->]

function eval(e,  c, i, top, stack) {

  words = ""
  c = split(e, e, "  *")
  top = 0
  for (i = 1; i <= c; i++)
    if (e[i] == "and") {
      stack[top - 1] = eval_and(stack[top], stack[top - 1])
      top--
      }
    else if (e[i] == "or") {
      stack[top - 1] = eval_or(stack[top], stack[top - 1])
      top--
      }
    else
      stack[++top] = eval_word(e[i])

  sub("^ *", "", stack[1])
  sub(" *", "", stack[1])

  return stack[1]
  }

eval_and() returns the intersection of the two page sets a and b.

<define the query evaluation functions>+= (<-U) [<-D->]

function eval_and(a, b,  ab, i) {

  split(a, a, "  *")
  ab = ""
  for (i in a) 
    if (match(" " b " ", " " a[i] " ")) ab = ab " " a[i]

  return ab
  }

eval_or() returns the union of the two page sets a and b.

<define the query evaluation functions>+= (<-U) [<-D->]

function eval_or(a, b,  ab, i) {

  split(a, a, "  *")
  ab = b
  for (i in a) 
    if (!match(" " ab " ", " " a[i] " ")) ab = ab " " a[i]

  return ab
  }

eval_word() returns set (represented as a space-separated string) of page numbers containing the word w. words is a global variable representing the set of words used in the keyword search; pages is the concordance.

<define the query evaluation functions>+= (<-U) [<-D]

function eval_word(w) {
  if (!match(words " ", " " w " ")) words = words " " w
  return pages[w]
  }

The file slv-conc is the concordance for the Software Loader/Verifier specification. The concordance keeps detailed information about the words appearing in the specification; most of that information is thrown away so that only the page numbers remain. The global array pages holds the relevant information: pages[w] is the set (represented as a space-separated string) of numbers for the pages containing the word w.

<read the document concordance>= (<-U)

while ((getline l < "slv-conc") > 0) {
  w = l
  sub(" .*", "", w)
  p = l
  sub("^[^ ]* *", "", p)
  pages[w] = p
  }
close("slv-conc")

The keyword search arrives in name-value form with spaces replaced by plus signs (the encoding is actually more involved than that, but this code doesn't deal with the more complex issues). Split things apart and clean things up so that the name n has the value args[n].

<parse the browser request>= (<-U)

for (i = 1; i <= NF; i++) {
  if (split(i, parts, "=") != 2)
    print "split() != two" > "/dev/stderr"
  gsub("+", " ", parts[2])
  args[parts[1]] = parts[2]
  }

Assuming the result of the keyword search contains at least one page, create the response page, which is a table containing alternating rows of page thumbnails and a page labels.

<generate the thumbnail page>= (<-U)

page_cnt = split(page_set, pgs, "  *")
col_cnt = 4

print "<table>"
for (p = 1; p <= page_cnt; p += col_cnt) {
  <create a thumbnail row>
  <create a label row>
  }
print "</table>"


Create a row of page thumbnails. Also use the gen-gif program to create the thumbnail gifs, and the gen-page program to create the full-sized version of the page in case the thumbnail is selected.

<create a thumbnail row>= (<-U)

print "<tr>"
for (i = p; (i <= page_cnt) && (i <= p + col_cnt); i++) {
  printf "<td>"
  printf "<a href=\"../%s.html\"><img src=\"../%s.gif\"></a>", pgs[i], pgs[i]
  print "</td>"

  c = "gen-gif " words " < ../slv/" pgs[i] " > " pgs[i] ".gif"
  system(c)
  c = "gen-page " " < ../slv/" pgs[i] " > " pgs[i] ".html"
  system(c)
  }
print "</tr>"

Create the row of labels appearing under the previous row of page thumbnails.

<create a label row>= (<-U)

print "<tr align = center>"
for (i = p; (i <= page_cnt) && (i <= p + col_cnt); i++) {
  f = pgs[i]
  sub(".txt", "", f)
  sub("<br>.", " ", f)
  print "<td>" f "</td>"
  }
print "</tr>"