Overview


 

-------

Introduction

XWRAP Elite is a software toolkit for generation of XML-enabling wrapper programs for Web sources. By XML-enabling, we mean that the wrapper programs generated by XWRAP Elite can transform an HTML document into an XML document and deliver the extracted data content in XML format with a DTD.

XWRAP Elite service is provided free of charge. By using the Elite service, you can build a ready-to-go wrapper for any of your favorite web sites in just a few minutes. You can learn how to build a wrapper program by a few clicks (see walkthrough). The wrappers are generated as Java classes (click here for an example).

The core technology of the XWRAP Elite toolkit is the automatic discovery and extraction of objects of interest and their elements. The object and element extraction heuristics are computed and derived automatically for any given web page. For those data objects intensitve web sites such as bn.com, ebay.com, cnet.com, buy.com, our object and element extraction algorithms can offer 95% to 100% of accuracy. Another distinct feature of our extraction algorithms is its robustness against the representational changes of the websites. The XWRAP Elite toolkit has been tested over thousands of web pages and hundreds of web sources. By utilizing XML enabling wrappers, the content of Web sources can be easily made accessible to those applications that need to filter, fuse, integrate, and summarize data from multiple and disparate Web information sources. In addition to the object and element extraction algorithms, the XWRAP Elite service also provides an XML Wrapper Query Language (XML-WQL) as the data exchange language for accepting application query requests (either keyword-based or content-sensitive queries), and for returning the matching source documents or matching source objects in XML format. Other components of the XWRAP Elite service include the automatic extraction of user query interfaces of the Web site to be wrapped, the code generator, the code testing module, and the code packaging component.

For a demonstration of the usefulness of the XWRAPElite wrappers, you may visit the Adaptive Query Routing system. For downloading of ready-made XWRAP Elite wrappers, you may visit the XWRAP wrapper repository.

How does it work

As a user of XWRAP Elite, you can use the toolkit to generate the wrapper program of your favorite Web site in three consecutive phases:

  • Phase 1 - Object & Element Extraction
    This phase helps you generate an object and element extraction component that extracts and converts an HTML page into an XML document. You can obtain the source code for the extraction component in the following steps:
    1. First, you enter a website that you are interested in generating a wrapper for. For example, you choose fatbrain.com.
    2. Then you can surf the website through our proxy-like service, and find the kind of web pages that you want to wrap. For example, you run a search on all JDBC books, and you are now at the search result page where all the JDBC books offered at fatbrain.com site are listed. If the result page is the kind of pages you want to wrap for, then click on the Data Extraction button in the Elite service panel located at the top-left corner.
    3. Now, XWRAP Elite automatically discovers and separates objects and elements in the sample page that you have chosen. At this step, the toolkit computes and learns a set of object extraction heurstics and element extraction heuristics. These heuristics are the core of the extraction component of the wrapper to be generated. If you are not happy with the first run of the extraction result, you may refine the object extraction and element extraction results using our easy-to-use GUI (see Walkthrough and Examples for detail). If you are satisfied with the extraction result, you may click the button to enter the next step.
    4. At the element extraction refinement step, you can refine the extraction by restricting or relaxing the number of elements per object or refine the data types of the elements. If you are happy with the extraction result, you may enter your favorite tag name for each of the elements extracted. Then click on to enter the code generation step for the extraction component.
    5. At the code generation step, XWRAP Elite produces the extraction module as a Java class that takes a URL (and a query string if the page is accessed by HTTP-POST method) and outputs XML data with the tagging you specify. You may download the source code, or run a few more tests before downloading. If you want to generate a wrapper that has a filtering capability over the XML document, you need to continue by entering Phase 2.

  • Phase 2 - Search Interface Extraction
    This phase allows you to build a component that constructs a URL for a web page by given keywords. Elite automatically captures the URL (and the query string if the page is accessed by HTTP-POST method). You identify the dynamic part of the URL and the query string as keywords. Then Elite will automatically generates a Java class that takes keywords as inputs and outputs a URL (and a query string.)

  • Phase 3 - Code Packaging
    This phase integrates the two source code components automatically by generating the wrapper main program as a wrapper class. This wrapper class takes keywords as input and produces XML data as output. Now you can download the complete set of source code and the object code to any computer where you would like to run the wrapper program. If you would like to share your wrapper with others, you may also register it with the XWRAP wrapper repository with you as the sole owner of your wrappers.


-------


For problems or questions regarding this web contact [XWRAP Elite].
Last updated: April, 2000.