Paper #: SE 28 Title: Authoritative Sources in A Hyperlinked Environment (1) Problems This paper deals with the problem of determining authoritative web sites. An authoritative website for some topic may not even have the text words of that topic included very much, so it might not be found by methods that just index the keywords and look for the most occurences of keywords. Spammers may put up websites that include many occurences of a certain keyword, to look like an authority. (2) New Idea and Strengths The paper introduces the idea of hub-authority analysis, using an iterative approach that takes advantage of the interlocking relationship between hubs and authorities. A "hub" is a website that has links to several authoritative websites. An authoritative website is one that many people consider to be authoritative on some issue. Real authoritative websites tend to be pointed to my many hubs, and good hubs point to a lot of real authoritative websites. This circular, cooperative relationship can be taken advantage of in an iterative algorithm to zero in on the hubs and authorities. First, it's necessary to get an "enriched" subset of websites (a "focused subgraph) that are more likely to be good sources for a particular topic. Then, one applies the hub-autority algorithm on that enriched set. The algorithm is proven valid by eigenvector/value techniques. (3) Weaknesses and Extensions The weakness of the paper is that the technique has proven to be theoretically interesting but apparently not that useful practically; the major search engines are not using this technique. The use of ranking (based on sites that are most often visited by users of the search engine) has been able to identify the authoritative sites well enough. However, in the case of emerging authoritive websites, and emerging web communities, it seems that the ranking approach won't work and perhaps the paper's approach would be better. From a mathematical, theoretical vantage point, the paper is very nice. Extensions proposed in the paper include making use of web traffic patterns in addition to link structural information; use of eigenvector-based heuristics; and, use of link-based methods to handle other information needs than the broad-topic queries emphasized in the paper. -- END --