WebSnatcher: WWW Prefetching and Caching

Students: Maria Gullickson, Catherine Eiccholz

Using Experience to Guide Web Server Selection Maria L. Gullickson, Catherine E. Eiccholz, Ann L. Chervenak and Ellen W. Zegura, to appear in Multimedia Computing and Networking, January 1999.

One obvious use for the massive storage provided by the Personal Terabyte is to prefetch and cache data likely to be needed by the user to avoid network delays. To experiment with prefetching data on the World Wide Web, we have written an application called WebSnatcher. A WebSnatcher user creates a profile reflecting his or her interests. This profile is composed of web site locations, bookmark files from netscape, and keywords. WebSnatcher initiates searches based on the user's specified keywords on up to six commercial search engines, including Yahoo and Alta Vista. After the searches are complete, WebSnatcher prefects the pages that best match the keywords of the query. WebSnatcher also prefects any other pages specified in the user profile. The prefetching results are stored in a directory on the user's local file system, providing fast display of the data without incurring network delays and allowing indexing of the search results.

WebSnatcher softwrae and a Georgia Tech technical report describing its design (GIT-CC-98-01) are available on the WebSnatcher Home Page .

The latest version of WebSnatcher includes anycasting networking technology. Ellen Zegura's anycasting work involves choosing one of a set of equivalent servers on the network to satisfy a particular request, with the choice of server made based on past performance. We incorporate a variation of anycasting in the WebSnatcher application, and study more than a dozen different algorithms for choosing a server from a set of equivalent servers to handle a request from WebSnatcher.

Using a mechanism like anycasting in the context of WebSnatcher is important for two reasons. First, it allows users to get better interactive performance when fetching data that will be accessed soon by picking a server that has historically provided good performance. Second, prefetching by large numbers of people could generate large amounts of network traffic. A mechanism like anycasting would allow individuals to do such prefetching more responsibly, avoiding heavily-loaded network paths and servers.