XWRAP Original

an eXtensible Wrapper Generation System


 

-------

Overview

XWRAP is an XML-enabled software system for semi-automatic generation of wrapper programs for Web sources. XWRAP software technology consists of XWRAP original and XWRAP elite.

By semi-automatic, we mean that XWRAP provides an interactive interface program to assist the wrapper developers and make the creation of wrappers easy and customizable. By XML-enabled, we mean that the wrapper program generated is able to produce XML representation of the source documents for those non-XML Web sources and produce a content-sensitive XML representation for those XML source documents by removing undesirable parts in the source documents such as advertisements.

The goal of the XWRAP toolkit is to make the content of Web sources easily accessible to any kind of applications. We achieve this goal through three important capabilities:

  • We use XML as the XWRAP data exchange language for accepting application request (either keyword-based or content-sensitive requests), and returning the matching source documents in XML format. This capability guarantees the delivery of both data and metadata of the source content and facilitates further processing need of the applications.
  • The wrapper programs generated by XWRAP can automatically perform the types of content extraction and information filtering designed by the users (wrapper developers).
  • We demonstrate the usefulness of the wrapper programs through the Query Router system and the Continual Query system.

How does it work

As a user of XWRAP Original toolkit, you can use the toolkit to generate the wrapper program of your faviorate Web site in five steps:

  1. Start XWRAP Original with a Java Applet Viewer or a browser, Enter your faviorate URL. Now XWRAP Original is fetching the source document for you. So you can use this sample document to train XWRAP to build a wrapper program for your faviorate Web site.
  2. Click Source Normalization to start the tree formation of your source document. Now you can view the source document in an XML-conformed tree graph.
  3. Click Semantic Token Extraction to start XWRAP in semantic token recognition. This is the first time that XWRAP needs your help in recognize which semantic tokens are of interest to you. After a few interactive exchange, XWRAP will generate a comma-delimited file for all the semantic tokens and thei attribute and value pair relationships, as well as the set of S-token extraction rules.
  4. Now you are ready to enter the Hierarchical Structure Extraction phase, where XWRAP will need your help in identifying the logical and presentation layout structure of the source document. XWRAP will produce an XWRAP template in XML-compatible format, which describes a set of hierarchical structure rules of the source document.
  5. By clicking Learn button, you start training the XWRAP to learn how to generate an XML representation of the source document.
  6. By clicking WP Generation button, XWRAP will generate the executable wrapper program.
  7. In the Test stage, you need to enter another source document URL of the same site so that XWRAP can work with you together to test if the wrapper program can act correctly. You may run the testing as many times as you wish before release the wrapper program.
  8. Now you can click the Release to let XWRAP prepare a software package and a plug-in for your wrapper program.

Wrapper Generation Toolkit

Will be available soon.

Click here to view the screenshots of XWRAP Original.

Demonstration wrappers

Our Projects Using XWRAP Technology

  • CQ - Continual Queries
  • Query Routing - Routing queries to appropriate sources over the Internet.
  • TAM - Transactional Workflow Management System for Distributed Data Intensive Systems
  • Infosphere - Infopipes for Fresh Information Delivery
  • DIOM - Distributed Interoperable Object Management System for Intelligent Mediation

Related Projects

Back to Top

For problems or questions regarding this web contact xwrap-help@cc.gatech.edu.
Last updated: November 06, 1998.