Paper #: Week 2 Paper 2 Title: Mercator: A Scalable, Extensible Web Crawler (1) Problems This paper describes the working of their web crawler - Mercator. They contest that not many research papers are available which describe working and implementation of web crawlers. Also, that their design and implementation is scalable and extensible. Scalability will help them fetch tens of millions of web documents. Extensibility will help third parties extend and modify Mercator. (2) New Idea and Strengths Scalability: By using bounded memory and in disk storage for their various data structures, the authors discuss how their architecture can scale to tens of millions of documents. Extensibility: Mercator is designed in a modular fashion and implemented in Java. Various components are designed such that they can be replaced with others or new components can be added via the configuration file. I feel that this is a very nice design. Explanation with results: Explanation is very clear and easy to understand and also supported by results and comparisons. A person interested in developing a crawler can use this as a nice handy reference. (3) Weaknesses and Extensions It is difficult to spot more weaknesses than the ones that the authors have already pointed out in their section on "Crawler Traps and Other Hazards". The comparison between other crawlers could have been made more of an apple to apple comparison, though I am not sure if such results are published by competing web crawler developers. -- END --