Paper #: Week 7 Paper 38 Title: Focused crawling: a new approach to topic-specific Web resource discovery (1) Problems This paper addresses the problem of crawling better by using focused crawling. This paper is written in the pre Google era when the search results being returned by the search engines were not of the "best quality" and gives a solution by describing the focused crawler which will selectively seek out pages which are relevant to a pre defined set of topics. (2) New Idea and Strengths - This paper gives a good solution to web crawling. It uses the Divide and Conquer strategy. By crawling well for predefined topics instead of a general crawl of the web, the paper suggests that the results will be much better. In the evolution of crawlers this is a good idea. - The focused crawler uses a small investment in hardware and network resources because it is crawling only for a specific topic. This is a plus point. - The classifier and distiller are weell designed. - The authors present their idea alongwith evaluation results to back up their idea. - Diagrams / images to show the UI and design also make it easier for the reader to understand the paper. (3) Weaknesses and Extensions - I am not sure about the paper's relevance in the Google era. This was written in the pre Google era during which these challenges were unsolved. Due to google most of the concerns addressed have been taken care of. The use of a general crawler alongwith good indexing seems to have done the needful. - The paper discusses the UI which I think is pretty complex. It requires human intervention which is not an ideal scenario. - An extension to this might be to check the paper's (and idea's) relevance in todays world. -- END --