Appendix - A Keyword Searching Tool


It is not too unusual to see the terminology established by specification and design documents carried over into later portions of the project; after all, these early-occurring documents set the domain of discourse for the project. For example, when the specification requires high-performance for repeat queries and the design document suggests caching as a technique that avoids duplicating effort, it seems likely that the program text will contain the words "performance" and "cache" in comments (e.g., "... cache the query and its response to avoid resending the same query...") and code (e.g., the routine InitQueryCache()).

In the opposite direction, words appearing in the program text may have come about due to their appearance in preliminary project documents, and understanding the context in which these words occur in preliminary documents may help explain their appearance in program text. A maintainer examining the program text may wonder why there are many routines built on the phrase "QueryCache". By searching these early project documents, the maintainer may be able to gather insight as to why "QueryCache" appears to be important to the program.

The Keyword Searcher is a prototype, Web-based search engine for examining project documents in the way suggested by the previous paragraphs. By giving keyword search requests, the maintainer can examine document pages according to the words they contain, giving the maintainer a selected and oriented view into the documents.

Figure 1 shows the Keyword Searcher's query page. The page accepts a keyword search and sends it off to a CGI script for further processing. In this example, the maintainer is asking to see all pages containing the words "software" and "download".


Figure 1


The CGI script parses the keyword search and applies it to a concordance created from the documents of interest; the result is a set of pages containing the words matching the keyword search. The CGI script sends the page set to a program that constructs the response page shown in figure 2. Each keyword is color coded ("software" is green and "download" is red in this example) to give some idea of the pages' match density. The thumbnail pages are presented in no particular order.


Figure 2


Clicking on a thumbnail page retrieves the associated full-sized page; for example, clicking on page 01 in figure 2 results in figure 3.


Figure 3


By using the browser's forward and back buttons, as well as by spawning new pages, the maintainer can create a rather detailed set of connections between the terminology appearing in program text and in preliminary project documents.


This page last modified on 24 September 1997.