Building A Domain Specific Aggregation Service Using Wrappers

Sponsor Ling Liu / Wei Han
lingliu@cc.gatech.edu, weihan@cc.gatech.edu
223 CCB
Area Systems and Databases

Problem
The Web presents a large and growing amount of useful information such as airline schedules, online shopping malls, weather forecast. There is an increasingly demand to extract and make use of such information from various Web sites upon information consumer's requests. However, most of the information is in HTML format, which is not directly understood by other programs. A popular approach to extract information from Web sources is using wrappers to serve as interface programs that translate useful information into a more structured format, such as XML.

XWRAPElite is an online wrapper generation system, which is developed at Georgia Tech. It helps developers automatically or semi-automatically build wrappers that extract information from data rich Web pages into XML data. You can find XWRAPElite at http://www.cc.gatech.edu/projects/disl/XWRAPElite/. Wrappers generated by XWRAPElite can be stored in a wrapper repository at http://www.cc.gatech.edu/projects/disl/wrapperrepository/. One of the useful additions to XWRAPElite is to build some useful applications on top of wrappers generated by XWRAPElite.

Your objective in this project is to design and implement a domain specific application by composing a set of wrappers, which includes a wrapper composition module and a GUI for data display. You may generate the set of wrappers by XWRAPElite or use wrappers in the repository. The GUI interface should be Web-browser-based. The goal of this project is to make use of a wrapper set to achieve a more sophisticated functionality. Practical applications include product search results integration (such as semi-join), product price and/or feature comparison, or query routing.  You are encouraged to put your own insight into this project. For instance, you can also make the following improvements.

Background

You are expected to have a solid grasp of Java/CGI/HTML programming. Java will be the main programming language. Understanding of basic XML/XSL/CSS/WML technology will be useful in the project. There are a lot of new things you need to learn and apply them in this project, so you are expected to be a quick learner. But it is worth the time since you are learning cutting-edge technologies.

Links

Here are some links to help you get started (be sure to read the licensing documents before you download the software packages):


Deliverables

A report, describing your aggreation service, the insights and lessons in your project, and future improvements/extensions.
The source code for your aggregation service, including all the configuration files and your GUI interface screenshots. The whole system needs to be portable on both Unix system and Windows NT.

Evaluation
You will be graded on the novelty and quality of your aggregation service and your report.