next up previous
Next: Hosting and Sharing Wrappers: Up: Building an Extensible Wrapper Previous: Sharing and Distribution of

Constructing and Using Wrappers: The Component Library

A wrapper framework consists of a generic code structure and a collection of code components that can be tailored to build specialized wrappers through source-specific code customization. A key component of a wrapper framework is the wrapper API and a library of code that implements this API. Specifically, a wrapper API implemented in Java usually include the Java class hierarchy rooted at the Wrapper class.

A major challenge in designing an extensible wrapper framework for wrapper construction is the identification of mandatory functionality of a wrapper and the clean separation of optional functionality of a wrapper from undesirable functionality. For example, should we consider sophisticated retrieval mechanisms, error handling, and the choice of streaming mode or blocking mode as mandatory wrapper functionality? Should performance, robustness, statistics, proxies, optimization be better treated as optional functionality? Should we consider massaging data, sophisticated recovery strategies as undesirable wrapper functionality?

Based on our experience in building wrappers, the mandatory functionality of a wrapper should contain those capabilities that are crucial for achieving the basic goal of a wrapper. For example, to enhance the information extraction quality, it is necessary for a wrapper to provide sophisticated retrieval and filtering mechanisms, and simple error handling strategies. Examples of error handling strategies include handling timeouts with user-specified thresholds and providing a status method that returns the runtime status of the wrapper. Furthermore, to improve the responsiveness of a wrapper, both blocking mode and streaming mode of interaction between a wrapper and its applications should be provided. When a wrapper runs in the streaming mode, applications are able to fire a wrapper (e.g., by issuing a query) and receive a stream of returned data, rather than having to block the wrapper until the wrapper query is terminated. To provide the streaming interface, the Wrapper's fire(...) method returns a synchronized queue. The wrapper will run its own thread and write to the queue, and the application can read from it. Many applications may not take advantage of the streaming mode, so a much simpler blocking interface is provided.

In addition to the mandatory functionality, there are a number of optional functionality of a wrapper that are important and desirable, including performance statistics and mechanisms for wrapper query optimization. For example, many applications would need more detailed statistics information about the wrapper execution (e.g., clock times, bytes transferred, number of tuples or objects returned) than simply that the status is done or running. It is also desirable that a wrapper may act as a responsible optimizer that ensures that the applications will not throw dozens of queries per second to a site and especially allow an application-configurable inter-query delay.

In the current design of XWRAP, we consider the following functionality undesirable, primarily because of the complexity introduced.

Several functionality are useful but are not considered in the current design of XWRAP, including the mechanisms that make wrappers able to learn to be more adaptive to changes at source sites, and the incorporation of complex data types. Whether or not such functionality is desirable remains a question that needs to be answered.


next up previous
Next: Hosting and Sharing Wrappers: Up: Building an Extensible Wrapper Previous: Sharing and Distribution of

Ling Liu
Sun Feb 7 00:31:54 PST 1999