Sponsor | Ling Liu / James Caverlee
{lingliu, caverlee}@cc.gatech.edu 216 / 225B CCB |
Area | Systems and Databases |
The past few years have witnessed great strides in the accessibility and manageability of vast amounts of Web data. In particular, the widespread adoption of general purpose search engines like Google and AllTheWeb has added a layer of organization to an otherwise unwieldy medium. But with the rise of high-quality data intensive web services on the so-called Deep Web (or Hidden Web) and the emergence of the web services paradigm, these popular tools are becoming less relevant. Recent studies suggest that the size and growth rate of the dynamic Web greatly exceed that of the static Web, yet dynamic content is often ignored by existing search engine indexers owing to the technical challenges that arise when attempting to search the Deep Web.
To address these challenges, we argue that there is a growing need for efficient mechanisms for discovering and ranking data intensive services. Effective mechanisms for web service discovery and ranking are critical for organizations to take advantage of the tremendous opportunities offered by web services, to engage in business collaborations and service compositions, to identify potential service partners, and to understand service competitors and increase the competitive edge of their service offerings.
In the context of service discovery, we have previously introduced two related systems:
This mini-project focuses on implementing a combined source-biased crawler that incorporates the BASIL algorithms into the DynaBot crawling architecture. Sample source code and relevant papers will be provided. There are a number of exciting opportunities to incorporate interesting research ideas into the crawler, so we look forward to hearing from interested students.
Deliverables:
Evaluation:
You will be graded on the novelty and quality of your report and implementation.