Paper #: Week 7 Paper 16 Title: Crawling towards Eternity: Building An Archive of The World Wide Web (1) Problems This article addresses a very nice problem - that of archiving the internet. Archiving the internet can be very useful to use as a researh tool and time capsule in future times. Using this data people can deduce evolution and development of trends, websites and technologies etc. The author describes the design of the Internet Archive system which archives the internet. (2) New Idea and Strengths - This article is a very well written article which provides a very clear high level description of the Internet Archive system. - The author walks through the problems and considerations for the design of the crawler for the Archiver. This is done in a easy to understand language. - The use of diagrams to explain the design enahnces the user knowledge and understanding. - The decision to crawl the website on a site by site basis is a sensible one since it waives off the cost of having to fetch a new URL everytime. - The design of the database for the indexing mechanisms is well thought. It is optimised at a high level for better performance (instead of searching through huge size tables and undergoing performance costs.) - Though the bitmap approach may generate false positives which may cause some unseen URLs to be declared as seen URLs, it is a much optimised approach than running a query on every URL to check if it has been seen and at worse a few unseen URLs will be lost forever. (3) Weaknesses and Extensions - The article differs from other papers that we have read in this class because of the style of writing. I guess it is oriented towards a larger audience and hence the language and content is kept high level. - The article lacks results from experiments thus comparing it to another archiving system will be difficult. - A more technical treatment of the Internet Archive sytem might be a good extension - Another extension might be to check the statistics of the internet archive now. It will be very interesting to see the growth in the size of the web through the years. All in all I think this is a very good article and the author has done a great job of making it easy to read and understand. -- END --