Next: Our Experience Up: Building an Extensible Wrapper Previous: Building an Extensible Wrapper

Introduction

The Internet presents a large and growing number of information sources - airline schedules, online shopping Malls, retail product catalogs, stock market quotes, job listings, weather forecasts, and many more. Such information can be gathered either by manual manipulation through a web browser or through automatic manipulation: the use of computer programs (rather than human) to interact with the corresponding information sources. Recently, many systems have been built that automatically gather and manipulate such information on behalf of information consumers' requests. One of the most popular mechanisms used by these systems is to extract content using wrappers [4, 3, 5, 2, 10, 1, 9].

A wrapper can be seen as a procedure that is designed for extracting content of a particular information source and delivering the content of interest in a self-describing representation. Although many wrappers to date are hand-written, it is widely recognized that constructing wrappers for web sources by hand is labor intensive for a number of reasons:

The number of information sources of interest is often very large and the content and presentation structures of different information sources may vary significantly.
Newer information sources of interest are frequently added to the Web.
The format of online information changes frequently.
Hand-written wrappers require high maintenance cost.

Therefore, mechanisms and technology to aid the construction and sharing of wrapper programs are essential for automatic manipulation of Web information.

In this paper we first summarize our experience with wrapper construction in the Continual Queries project (http://www.cse.ogi.edu/DISC/CQ/). Then we discuss the mechanisms for constructing and maintaining the wrapper frames library, and describe the metadata structure and methods for designing and implementing a distributed wrapper repository. We conclude the paper with a discussion on the related work and a summary of our contributions.

Ling Liu
Sun Feb 7 00:31:54 PST 1999