Overview
XWRAP is an XML-enabled software system
for semi-automatic generation of wrapper programs for Web sources.
XWRAP software technology consists of XWRAP original and XWRAP elite.
By semi-automatic, we mean that XWRAP provides an interactive interface
program to assist the wrapper developers and make the creation of
wrappers easy and customizable.
By XML-enabled, we mean that the wrapper program generated
is able to produce XML representation of the source documents for those
non-XML Web sources and produce a content-sensitive XML
representation for those XML source documents by removing undesirable
parts in the source documents such as advertisements.
The goal of the XWRAP toolkit is to make the content of Web sources
easily accessible to any kind of applications. We achieve this goal
through three important capabilities:
-
We use XML as the XWRAP data exchange language for accepting
application request (either keyword-based or content-sensitive
requests), and returning the matching source documents in XML format.
This capability guarantees the delivery of both data and metadata
of the source content and facilitates further processing need of
the applications.
-
The wrapper programs generated by XWRAP can automatically
perform the types of content extraction and information filtering
designed by the users (wrapper developers).
-
We demonstrate the usefulness of the wrapper programs through the Query
Router system and the Continual Query system.
How does it work
As a user of XWRAP Original toolkit, you can use the toolkit to generate
the wrapper program of your faviorate Web site in five steps:
-
Start XWRAP Original with a Java Applet Viewer or a browser,
Enter your faviorate URL.
Now XWRAP Original is fetching the source document
for you. So you can use this sample document to train XWRAP to
build a wrapper program for your faviorate Web site.
-
Click Source Normalization to start the tree
formation of your source document.
Now you can view the source document in an XML-conformed tree graph.
-
Click Semantic Token Extraction to start XWRAP in semantic
token recognition. This is the first time that XWRAP needs your
help in recognize which semantic tokens are of interest to you.
After a few interactive exchange, XWRAP will generate a
comma-delimited file for all the semantic tokens and thei attribute
and value pair relationships, as well as the set of
S-token extraction rules.
-
Now you are ready to enter the Hierarchical Structure
Extraction phase, where XWRAP will need your help in identifying
the logical and presentation layout structure of the source document.
XWRAP will produce an XWRAP template in XML-compatible format, which
describes a set of hierarchical structure rules of the source document.
-
By clicking Learn button, you start training the XWRAP to learn
how to generate an XML representation of the source document.
-
By clicking WP Generation button, XWRAP will generate the
executable wrapper program.
-
In the Test stage, you need to enter another source document
URL of the same site so that XWRAP can work with you together to test
if the wrapper program can act correctly.
You may run the testing as many times as you wish before release the
wrapper program.
-
Now you can click the Release to let XWRAP prepare a
software package and a plug-in for your wrapper program.
Wrapper Generation Toolkit
Will be available soon.