The Anatomy of a Large Scale Hypertextual Web Search Engine Lawrence Page, Sergey Brin Problems This paper discusses Google, a new web search engine that aims to solve many emerging challenges of finding information on the rapidly exploding world wide web. (1) Scaling: Google is designed with the goal of being able to scale and work as the web grows in size (amount of information). Scalability becomes essential as the web grows at breakneck speed. Storage space needs to be used efficiently, the indexing system must process hundreds of gigabytes of data efficiently and the search engine must be able to handle millions of queries every day (thousands of queries a second). (2) Quality of Search: Despite the information explosion on the web, search engine users are typically interested only in the first few links returned by search engine. Given this constraint, the number of matches returned by an engine takes a back seat to the relevance and quality of the first few links returned by the search engine since those are the links that will be used by the user. Google attempts to provide better quality search results by introducing a novel ranking(PageRank) system that is used to prioritize the results. Strengths and New Ideas (1) PageRank: The PageRank is the striking feature of the google search engine. Google's emphasis on better quality results than on indexing a larger portion of the web is justified by the fact that a typical user rarely looks beyond the first few matches returned by the search engine. Google has created citation graphs that allow them to calculate a web page's "Page Rank". Page Rank is a measure of a web page's citation importance that aligns well with the human idea of importance-hence page rank becomes an effective measure to prioritize the search results. Page rank is based on the idea that a page's importance goes up based on the number of back links it has--intuitively, the more the number of back links, the more useful the page, since people would tend to link to "good quality" information and a lot of links to one page is a strong indicator of people agreeing to the good quality of information available on that page. Besides, the rank of the page also depends on the page rank of the pages that point to it. A better quality page would intuitively point to good information. (2) Anchor Text Google also uses the text of links to refer to pages that the link points to in addition to the page that the link is on. This is based on the fact that link text is often a more accurate descriptor of web pages than the information on the pages itself. In addition, the association of links to the pages they point to allows google to index a wider variety of documents such as images, programs, databases etc. (3) Mention of Bottlenecks The paper also discusses the bottlenecks that were encountered in the development of google the main one being disk seek time. While Google's focus was not on efficiency, for a commercial search engine requiring to handl millions of queries, fast response time is crucial. The description of the bottlenecks alerts people new to the field to possible pitfalls to be careful of while designing their own search engines. Weaknesses and Extensions (1) Although the authors have discussed the data structures and major components, it is not thorough and detailed. For a system as big and complicated as Google, there has to be a lot more going on than is covered in the paper. The discussion presented in the paper is more an overview. (2) The use of anchor links might have a possible negative aspect to it. The authors agree that achor links might point to non-existent pages although they reduce the probability of such an occurence by sorting the results. While the number of anchor links returned can be handled currently, will it be able to scale when the web grows by an order of magnitude. This would be something that might need to be looked into. (3) The authors claim that Google provides better results than other commercial search engines. However, a detailed comparison between Google and other search engines is not provided.