Attention: The information contained in this page is subject to changes.
| Requirement | Required Readings | General/Recommended Readings | Reading Summary Posting |
There will be several background readings assigned each week. The readings will either be handed out a week before or listed on the Web page for required readings. You may also access this information from course schedule.
You are expected to read the material each week and write 2-3 paragraphs per reading giving your impressions and thoughts. The summaries should be informal and brief, and should consist of your own comments on the readings, NOT a rehash of the content.
You should email your summaries to TA: Bhuvan Bamba and Anand
Murugappan ({bhuvan, anandm}@cc.gatech.edu), preferably before each class but
no later than
The summary for each reading assignment is expected to consist of 1 paragraph on each of the following three aspects: (1) the positive aspect of the paper; (2) the negative aspect of the paper; and (3) a brief discussion on how the idea or method proposed or used in evaluation may be applied to your own project for the course.
You may want to keep these guidelines in mind when reading papers.
You may find the following short article helpful:
Efficient Reading of Papers in Science and Technology By Michael J. Hanson and updated by D. McNamee
2. Web Servers
4. System Level Issues
5. Advanced Internet System
6. Web2.0/Web3.0 and Social Networks
You are expected to read papers in the required reading list, but only write summary for one paper selected from the list of 2-3 required readings associated with each lecture. Please use the Summary Template to write the reading summaries and follows the summary submission suggestions to submit your summaries.
Questions: How to build a search engine that scales up as the Web grows?
1. Google, The
Anatomy of a Large-scale Hypertextual Web Search Engine. Sergey Brin and
2. Inktomi: An Investigation of Documents from the World Wide Web Allison Woodruff, Paul M. Aoki, Eric Brewer, Paul Gauthier, and Lawrence A. Rowe
3. Harvest: Scalable Internet Resource Discovery: Research Problems and Approaches C. Mic Bowman (Tranarc Corp.), Peter Danzig (Univ. Southern California), Udi Manber (Univ. of Arizona), and Michael Schwartz (Univ. Colorado), Appeared in CACM 1994 (Download Harvest Indexer) (Harvest Papers)
4. Harvest: The Harvest Information Discovery and Access System C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber and Michael F. Schwartz, Computer Networks and ISDN Systems, 28 (1995) pp. 119-125
5. Harvest: A Scalable,Customizable
Discovery and Access System C. Mic Bowman, Peter B. Danzig, Darren R.
Hardy, Udi Manber, Michael F. Schwartz, and Duane P. Wessels, Technical Report
CU-CS-732-94, Department of Computer Science,
6. Customized Information Extraction as a Basis for Resource Discovery Darren R. Hardy and Michael F. Schwartz, ACM Transactions on Computer Systems.
7. Indie: Distributed Indexing of autonomous Internet Services Peter Danzig, Shih-Hao Li, Katia Obraczka. Journam of Computer Systems, 5(4), 1992. Original description of Indie in 1991 ACM SIGIR
8. Internet resource discovery services Katia Obraczka, Peter Danzig, and Shih-Hao Li, IEEE Computer, Sept. 1993.
9. Research Problems for Scalable Internet Resource Discovery C. Mic Bowman, Peter B. Danzig, and Michael F. Schwartz, 1993 IEEE Computer.
10. GLIMPSE: A Tool to Search Through
Entire File Systems Udi Manber and
Sun Wu (
11. WebGlimpse--Combining Browsing and Searching Udi
Manber, Mike Smith, and Burra Gopal (
12. Mercator: A Scalable,
Extensible Web Crawler Allan Heydon and Marc Najork,
13. A technique for measuring the
relative size and overlap of public Web search engines Krishna Bharat and
Andrei Broder (DIGITAL,
14. The Connectivity Server: fast
access to linkage information on the Web Krishna Bharata, Andrei Brodera,
Monika Henzingera, Puneet Kumara, and Suresh Venkatasubramanian, Proceedings of
the 7th International World Wide Web Conference,
15. Efficient Crawling through URL Ordering Junghoo Cho, Hector Garcia-Molina, and Lawrence Page, Proceedings of the 7thInternational World Wide Web Conference, pages 161-172, April 1998
16. Crawling towards Eternity: Building an Archive of the World Wide Web Mike Burner, Web Techniques Magazine, 2(5), May 1997
17. The Truth about the Web: Crawling towards Eternity Z. Smith, Web Techniques Magazine, 2(5), May 1997
18. Measuring Index Quality using Random Walks on the Web Monika Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc A. Najork, Proceedings of the 8th International World Wide Web Conference, pages 213-225, May 1999
19. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery Soumen Chakrabarti, Martin van den Berg, Byron Dom, Proceedings of the 8thInternational World Wide Web Conference, May 1999
20. Finding What People Want: Experiences with the WebCrawler Brian Pinkerton, Proceedings of the 8th International World Wide Web Conference, 1994
21. SPHINX: A Framework for Creating Personal, Site-specific Web Creawlers Robert C. Miller and Krishna Bharat, Proceedings of the 7th International World Wide Web Conference, pages 119-130, April 1998
22. Information Retrieval on the World Wide Web Venkat N. Gudivada, Vijay V. Raghavan, William I. Grosky, and Rajesh Kasangottu, IEEE Internet Computing, vol. 1, number 5, September/October, 1997.
23. GENVL and WWW: Tools for Taming the Web Oliver McBryan, Proceedings of the First Int'l World Wide Web Conference, CERN, Geneva, May 1994.
24. A World Wide Web Resource Discovery System Budi Yuwon, Savio L. Y. Lam, Jerry H. Ying, Dik L. Lee Proceedings of the 4th World Wide Web Conference, 1998.
25. A Survey of
Information Retrieval and Filtering Methods Christos Faloutos and Douglas
Oard (
26. Guidelines for Robot Writers, Martijn Koster, 1993
27. Robots in the Web: threat or treat? Martijn Koster, NEXOR, April 1995, [1997: Updated links and addresses]; A Standard for Robot Exclusion Martijn Koster.
28. Authoritative Sources in a Hyperlinked Environment, J. Kleinberg. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997. ( IBM Clever Searching Project)
29. How Search Engines Rank Web Pages Danny Courtois and Sullivan.
30. Evaluation of Web search engines and the search for better ranking algorithms. Mildrid Ljosland e-mail: Mildrid.Ljosland@idi.ntnu.no Norwegian University of Science and Technology. the SIGIR99 Workshop on Evaluation of Web Retrieval, August 19, 1999
31. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Proceedings of the 7th World-Wide Web conference, 1998. Copyright owned by Elsevier Sciences, Amsterdam.
32. Inferring Web Communities from Link Topologies. D. Gibson, J. Kleinberg, and P. Raghavan. Proceedings of The Ninth ACM Conference on Hypertext and Hypermedia, 1998. Copyright owned by ACM.
33. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. S. Chakrabarti, B. Dom, R. Agrawal, P. Raghavan. VLDB Journal, 1998 (invited).
34. Enhanced hypertext categorization using hyperlinks.S. Chakrabarti, B. Dom and P. Indyk. Proceedings of ACM SIGMOD 1998.
35. Hypersearching the web. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. Scientific American, June, 1999.
36. Mining the link structure of the World Wide Web. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. IEEE Computer.
37. Trawling
the Web for emerging cyber-communities. S.R. Kumar, P. Raghavan, S.
Rajagopalan, and A. Tomkins. Eighth
World Wide Web conference,
38. Focused crawling: a new approach to topic specific resource discovery. S. Chakrabarti, M. Van den Berg, B. Dom Eighth World Wide Web conference, Toronto, 1999.
39. The web as a graph: Measurements, models and methods. J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Proceedings of the International Conference on Combinatorics and Computing, 1999; invited paper.
40. Extracting large scale knowledge bases from the web. S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. IEEE International conference on Very Large Databases (VLDB), Edinburgh, Scotland.
41. Clustering categorical data: an approach based on dynamical systems. D. Gibson, J. Kleinberg and P. Raghavan. Proceedings of the VLDB conference, 1998.
42. Search and Ranking Algorithms for Locating Resources on the World Wide Web B. Yuwono and D. Lee. IEEE conference on Data Engineering, 1996 (pp391-400).
43. A Machine Learning Architecture for Optimizing Web Search Engines, J. Boyan, D. Freitag, and T. Joachims. AAAI Workshop on Internet-based Information Systems, 1996.
44. SIBRIS: the Sandwich Interactive Browsing and Ranking Information System S. Wade, P. Willett, and D. Bawden. Journal of Information Science, 15, 1989, pp249-260
45. Estimating the Usefulness of Search Engines
46. The effectiveness of GlOSS for the Text Database Discovery Problem L. Gravano, H. Garcia-Molina, A. Tomasic. SIGMOD 1994. (GlOSS)
Questions: What are the key technology for building high performance and scalable Web Servers ?
1. Measuring the Capacity of a Web Server Gaurav
Banga and Peter Druschel, Proceedings of the 1997 USENIX Symposium on
Internet Technologies and Systems,
2. Internet Web Servers: Workload Characterization and Performance Implications Arlitt and Williamson, ACM/IEEE Transactions on Networking, 5(5):631-645, Oct. 1997. A short version titled "Web Server Workloa d Characterization: The Search for Invariants", appeared in ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1996.
3. Locating Nearby Copies of Replicated Internet Servers, James D. Guyton and Michael F. Schwartz. ACM SIGCOMM, 1995.
4. HACC:
An Architecture for Cluster-Based Web Servers Xiaolan Zhang,
Michael Barrientos, J. Bradley Chen, Margo Seltzer (
5. Trace-Driven Simulation of Document Caching Strategies for Internet Web Servers Martin F. Arlitt and Carey L. Williamson, Simulation, Special Issue: Modeling and Simulation of Computer Systems and Networks. Vol. 68, No. 1, January, 1997.
6. Performance Characteristics of Mirror Servers on the Internet Andy Myers, Peter Dinda, Hui Zhang. INFOCOM'98.
7. TCP Behavior of a Busy Internet Server: Analysis and Improvements Hari Balakrishnan, Venkata Padmanabhan, Srini Seshan, Mark Stemm and Randy H. Katz, INFOCOM'98.
8. The Content and Access Dynamics of a Busy Web Site: Findings and Implications V. N. Padmanabhan and L. Qiu. Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden, August 2000. . (An earlier version appeared as Microsoft Research Technical Report MSR-TR-2000-13, February 2000 )
9. A Performance Monitoring and Capacity Planning Methodology for Web Servers Rodney B. Wallace and Tyrone E. McKoy, Jr. (NCR Corporation)
10. A Self-Scaling and Self-Configuring Benchmark for Web Servers Stephen Manley (Network Appliance), Michael Courage (Microsoft Co.), and Mar go Seltzer (Harvard)
11. Connection Scheduling in Web Servers M. E. Crovella, R. Frangioso, and M. Harchol-Balter, Proceedings of the 1999 USENIX Symposium on Internet Technologies.
12. Dynamic Server Selection in the Internet Mark E. Crovella and Robert L. Carter, Computer Science Department, Boston University.
13. Dynamic Server Selection Using Bandwidth Probing in Wide-Area Networks, R. Carter and M. Crovella, INFOCOM, 1997. extended version (TR-96-007).
14. NCSA's World Wide Web Server: Design and Performance Tomas T. Kwan, Robert E. McGrath, and Daniel A. Reed, IEEE Computer, Vol. 28, No. 11, pp. 68-74, November 1995. An earlier version titled: User Access Patterns to NCSA's World Wide Web Server. (Recent Pablo Project)
15. A Scalable and Highly Available Web Server, MukherjeeTewari. In Proceedings of the IEEE Computer Conference (COMPCON), Santa Clara, March, 1996.
16. A Scalable HTTP Server: The NCSA Prototype, R. McGrath. Proc. of the 1st Intl. World-Wide Web Conference, May 1994. (HTML)
17. The Power of Two Choices in Randomized Load Balancing, M. Mitzenmacher, PhD. Thesis, 1996.
18. Flash:
An Efficient and Portable Web Server Vivek Pai, Peter Druschel
and Willy Zwaenepoel, Proceedings of 1999 USENIX Conference,
19. The AFS File System in Distributed Computing Environments:White Paper, Mnsarc Corporation, May 1996.
20. Apache Server (Apache HTTP Server V1.3 - API notes)
Questions: How do we coordinate the activities of a geographically distributed application?
1. World Wide Web Proxies Ari Luotonen (CERN) and Kevin Altis (Intel) (html)
4. Main Memory Caching of Web Documents, vangelos P. Markatos. In Proceedings of the Fifth WWW Conference, 1996.
5. Design Considerations for Integrated Proxy Servers S. Sahu, P. Shenoy, D. Towsley. y Proc. IEEE NOSSDAV'99 (Basking Ridge, NJ, June 1999).