Sponsor |
Ling Liu (lingliu@cc.gatech.edu) |
Area |
Information and Knowledge Management |
Problem
Relevance analysis is an important issue in search engine, which returns ranked
lists bringing the mostly related web documents first to the users. It consists
of three parts: term-based proximity analysis, anchor text analysis and link
analysis. PageRank and Hub-authority algorithms are the good examples for
global link analysis, which are based on links between roughly all pages on the
web and independent of the specific queries. However, link information within
the documents returned by search engine about the user specific queries could
be useful in improving the relevance too. The users may want to see that all
documents that are related to each other by hyperlinks are listed together. In
this project, we want to see if the first few documents returned by search
engine are related by hyperlinks, and how the link structure is.
The project requires you to use Google API to get the search results. Google offers a programmatic interface http://www.google.com/apis. You can build your program on the Java sample code or write new code with other languages, such as Perl or Python, using SOAP protocol.
Here is what you need to do.
Deliverables
Evaluation
Based on the report turned in to the sponsor of the project by the due date.