Portfolio - Liam Mac Dermed
Liam's Photo
Research: Semantic Search
Overview:

For the final project of a search engines class, I developed a method for evaluating webpages based on their semantic relevance. Together with my two partners we built a system that used WordNet to cluster sentences into semantically related categories. The categories represented sentences conveying the same information. The more sentences in any give category, the more support those sentences have across WebPages, and the more likely they will be important sentences. Then, webpage ranking is simple a matter of choosing the most well supported documents -those with the highest concentration of well supported sentences. Unfortunately, this process was computational very demanding and isn't a viable option for search engines at this time. However, it was well worth exploring as a possible avenue for research as computation power may soon catch up.

Downloads:
Download Paper
simsFinalProject.doc
Final paper for the project - includes work-flow diagram, details, and survey results. Download Java Segment
Shunter(Q-expand).zip
Code for query expansion and document retrieval - written in Java. Based off of LuceQE.
Download Perl Segment
SentenceParser.pl
Code for parsing the fetched webpage text from Q-expand to XML - written in Perl. Download C# Segment
Shunter(page-rank).zip
Code for doing semantic comparisons on sentences and clustering them into groups
NOTE: requires WordNet and .net framework
contact:   -   last edited 08/01/07