This appendix describes an AWK script that serves as a CGI script for processing keyword searches through documents.
<gen-view
>=
#!/bin/ksh
echo QUERY_STRING |
gawk '<define the query evaluation functions>
BEGIN {
FS = "&"
print "Content-type: text/html"
print ""
print "<html><head></head>"
<read the document concordance>
}
{<parse the browser request>
page_set = eval(args["query"])
if (page_set == "")
printf "Can%ct find any pages containing \"%s\".", 39, args["query"]
else {
<generate the thumbnail page>
}
}
END {
print "</html>"
}
'
A keyword search is given as a sequence of words and the boolean operations
and
and or
. To keep things simple the search is
given in postfix format.
eval()
accepts the keyword search
e
and splits it into words. stack
is used as an
argument stack for the searches; top
is the index of the top of
the stack.
Upon successful evaluation of the search stack[1]
contains the set
(represented as a space-separated string) of numbers for the pages matching the
search. Leading and trailing spaces are trimmed from the set and it's returned
as the result of the evaluation.
<define the query evaluation functions>= (<-U) [D->] function eval(e, c, i, top, stack) { words = "" c = split(e, e, " *") top = 0 for (i = 1; i <= c; i++) if (e[i] == "and") { stack[top - 1] = eval_and(stack[top], stack[top - 1]) top-- } else if (e[i] == "or") { stack[top - 1] = eval_or(stack[top], stack[top - 1]) top-- } else stack[++top] = eval_word(e[i]) sub("^ *", "", stack[1]) sub(" *", "", stack[1]) return stack[1] }
eval_and()
returns the intersection of the two page sets a
and b
.
<define the query evaluation functions>+= (<-U) [<-D->] function eval_and(a, b, ab, i) { split(a, a, " *") ab = "" for (i in a) if (match(" " b " ", " " a[i] " ")) ab = ab " " a[i] return ab }
eval_or()
returns the union of the two page sets a
and b
.
<define the query evaluation functions>+= (<-U) [<-D->] function eval_or(a, b, ab, i) { split(a, a, " *") ab = b for (i in a) if (!match(" " ab " ", " " a[i] " ")) ab = ab " " a[i] return ab }
eval_word()
returns set (represented as a space-separated string) of page
numbers containing the word w
. words
is a global variable representing
the set of words used in the keyword search; pages
is the concordance.
<define the query evaluation functions>+= (<-U) [<-D] function eval_word(w) { if (!match(words " ", " " w " ")) words = words " " w return pages[w] }
The file slv-conc
is the concordance for the Software Loader/Verifier
specification. The concordance keeps detailed information about the words
appearing in the specification; most of that information is thrown away so that
only the page numbers remain. The global array pages
holds the relevant
information: pages[w]
is the set (represented as a space-separated string)
of numbers for the pages containing the word w
.
<read the document concordance>= (<-U) while ((getline l < "slv-conc") > 0) { w = l sub(" .*", "", w) p = l sub("^[^ ]* *", "", p) pages[w] = p } close("slv-conc")
The keyword search arrives in name-value form with spaces
replaced by plus signs (the encoding is actually more involved than that,
but this code doesn't deal with the more complex issues). Split things apart
and clean things up so that the name n
has the value
args[n]
.
<parse the browser request>= (<-U) for (i = 1; i <= NF; i++) { if (split(i, parts, "=") != 2) print "split() != two" > "/dev/stderr" gsub("+", " ", parts[2]) args[parts[1]] = parts[2] }
Assuming the result of the keyword search contains at least one page, create the response page, which is a table containing alternating rows of page thumbnails and a page labels.
<generate the thumbnail page>= (<-U) page_cnt = split(page_set, pgs, " *") col_cnt = 4 print "<table>" for (p = 1; p <= page_cnt; p += col_cnt) { <create a thumbnail row> <create a label row> } print "</table>"
Create a row of page thumbnails. Also use the gen-gif
program to create the
thumbnail gifs, and the gen-page
program to create the full-sized version
of the page in case the thumbnail is selected.
<create a thumbnail row>= (<-U) print "<tr>" for (i = p; (i <= page_cnt) && (i <= p + col_cnt); i++) { printf "<td>" printf "<a href=\"../%s.html\"><img src=\"../%s.gif\"></a>", pgs[i], pgs[i] print "</td>" c = "gen-gif " words " < ../slv/" pgs[i] " > " pgs[i] ".gif" system(c) c = "gen-page " " < ../slv/" pgs[i] " > " pgs[i] ".html" system(c) } print "</tr>"
Create the row of labels appearing under the previous row of page thumbnails.
<create a label row>= (<-U) print "<tr align = center>" for (i = p; (i <= page_cnt) && (i <= p + col_cnt); i++) { f = pgs[i] sub(".txt", "", f) sub("<br>.", " ", f) print "<td>" f "</td>" } print "</tr>"