Paper #: Week 2 Paper 1 Title: The Anatomy of a Large-Scale Hypertextual Web Search Engine (1) Problems The search engines that were present before this paper was published had a few major problems that this paper tries to address: i) Quality of results provided ii) Quantity of web crawled iii) Scalability to future quantities of web documents iv) Efficiency of operation v) Lack of literature and academic research in this field. (2) New Idea and Strengths This paper is quite revolutionary to the search engine industry. a) Page Ranking Algorithm: This algorithm is quite dissimilar to ones other search engines have used. Instead of just using the frequency of "hits", it also uses anchor texts and link structures to determine the rank of the page. They also use positioning of words in documents, font size and capitalization to calculate the page rank. This they contest and prove improves quality of results obtained. b) In the days when the paper was written Google had crawled and indexed 24 million pages. They show, though not in too much detail, how they can easily scale to much higher number of pages and also use storage efficiently. c) Through the use of diagrams and brief descriptions they explain the working of some parts of their design. (3) Weaknesses and Extensions The main problem that I had with the paper was in visualizing/following the exact working of the system. I would say they have given a gist of how their system works without actually giving details which one could use to create a similar system, though it is understandable as this idea led to them launching a company. I suppose that the URLServer that feeds the crawlers can be compartmentalized into top level domains, each catering to one domain such as .edu URLServer, .com URLServer etc so as to prevent it from becoming a bottleneck in the operation of the crawlers. -- END --