Attention: The information contained in this page is subject to changes.
| Requirement | Required Readings | General/Recommended Readings | Reading Summary Posting |
There will be several background readings assigned each week. The readings will either be handed out a week before or listed on the Web page for required readings. You may also access this information from course schedule.
Homework/Assignment:
You are expected to read the material each week and write 2-3 paragraphs per
reading giving your impressions and thoughts. The summaries should be informal
and brief, and should consist of your own comments on the readings, NOT a
rehash of the content.
You should email your summaries to TA: Aameek
Singh AND Mudhakar Srivatsa ({aameek, mudhakar}@cc.gatech.edu),
preferably before each class but no later than
Reading Summary Guidelines:
The summary for each reading assignment is expected to consist of 1 paragraph
on each of the following three aspects: (1) the positive aspect of the paper;
(2) the negative aspect of the paper; and (3) a brief discussion on how the
idea or method proposed or used in evaluation may be applied to your own
project for the course.
You may want to keep these guidelines in mind when reading papers.
v
Problem Statement
¨
What is the problem area with which the paper
is concerned?
v
Contributions/New Ideas
¨
Summarize the authors' arguments. What the authors are proposing, new
architecture, algorithm, methodology?
v
Evaluation
¨
How did authors evaluate their new proposals?
Did they build a system? run a simulation, collect
traces from existing systems? or prove theorems? How
their data collection was done?
v
Weakness
¨
Comparing with the state of art research in the probem
area or according to the related work section in the paper, was the idea
proposed new? Was the approach novel?
You may find the following short article helpful:
Areas of
· Web Servers
· System Level Issues
o
System Support for Internet Applications
o
Security for Internet Applications
·
Additional
o Sensor, Stream, and Continual Query
Required
You are expected to read papers in the required reading list, but only write summary for one paper selected from the list of 2-3 required readings associated with each lecture. Please use the Summary Template to write the reading summaries.
General/Recommended Course Reading List
Questions:
How to build a search engine that scales up as the Web grows?
1. Google: The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page, Proceedings of the 7th International World Wide Web Conference, pages 107-117, April 1998
2. Inktomi: An Investigation of Documents from the World Wide Web Allison Woodruff, Paul M. Aoki, Eric Brewer, Paul Gauthier, and Lawrence A. Rowe
3. Harvest: Scalable Internet Resource Discovery: Research Problems and Approaches C. Mic Bowman (Tranarc Corp.), Peter Danzig (Univ. Southern California), Udi Manber (Univ. of Arizona), and Michael Schwartz (Univ. Colorado), Appeared in CACM 1994 (Download Harvest Indexer) (Harvest Papers)
4. Harvest: The Harvest Information Discovery and Access System C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber and Michael F. Schwartz, Computer Networks and ISDN Systems, 28 (1995) pp. 119-125
5.
Harvest:
A Scalable,Customizable Discovery and Access System
C. Mic Bowman, Peter B. Danzig,
Darren R. Hardy, Udi Manber,
Michael F. Schwartz, and Duane P. Wessels, Technical
Report CU-CS-732-94, Department of Computer Science,
6. Customized Information Extraction as a Basis for Resource Discovery Darren R. Hardy and Michael F. Schwartz, ACM Transactions on Computer Systems.
7. Indie: Distributed Indexing of autonomous Internet Services Peter Danzig, Shih-Hao Li, Katia Obraczka. Journam of Computer Systems, 5(4), 1992. Original description of Indie in 1991 ACM SIGIR
8. Internet resource discovery services Katia Obraczka, Peter Danzig, and Shih-Hao Li, IEEE Computer, Sept. 1993.
9. Research Problems for Scalable Internet Resource Discovery C. Mic Bowman, Peter B. Danzig, and Michael F. Schwartz, 1993 IEEE Computer.
10. GLIMPSE: A Tool to Search Through
Entire File Systems Udi Manber and Sun Wu (
11. WebGlimpse--Combining Browsing and Searching Udi Manber, Mike Smith, and Burra Gopal (Univ. of Arizona), to appear in the Proceedings of the 1997 Usenix Technical Conference, January 1997.(WebGlimpse Home Page) (WebGLIMPSE Publications)
12. Mercator: A
Scalable, Extensible Web Crawler Allan Heydon and
Marc Najork,
13. A technique
for measuring the relative size and overlap of public Web search engines
Krishna Bharat and Andrei Broder
(DIGITAL,
14. The
Connectivity Server: fast access to linkage information on the Web Krishna Bharata, Andrei Brodera, Monika Henzingera, Puneet Kumara, and
Suresh Venkatasubramanian, Proceedings of the 7th
International World Wide Web Conference,
15. Efficient Crawling through URL Ordering Junghoo Cho, Hector Garcia-Molina, and Lawrence Page, Proceedings of the 7thInternational World Wide Web Conference, pages 161-172, April 1998
16. Crawling towards Eternity: Building an Archive of the World Wide Web Mike Burner, Web Techniques Magazine, 2(5), May 1997
17. The Truth about the Web: Crawling towards Eternity Z. Smith, Web Techniques Magazine, 2(5), May 1997
18. Measuring Index Quality using Random Walks on the Web Monika Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc A. Najork, Proceedings of the 8th International World Wide Web Conference, pages 213-225, May 1999
19. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery Soumen Chakrabarti, Martin van den Berg, Byron Dom, Proceedings of the 8thInternational World Wide Web Conference, May 1999
20. Finding What People Want: Experiences with the WebCrawler Brian Pinkerton, Proceedings of the 8th International World Wide Web Conference, 1994
21. SPHINX: A Framework for Creating Personal, Site-specific Web Creawlers Robert C. Miller and Krishna Bharat, Proceedings of the 7th International World Wide Web Conference, pages 119-130, April 1998
22. Information Retrieval on the World Wide Web Venkat N. Gudivada, Vijay V. Raghavan, William I. Grosky, and Rajesh Kasangottu, IEEE Internet Computing, vol. 1, number 5, September/October, 1997.
23. GENVL and WWW: Tools for Taming the Web Oliver McBryan, Proceedings of the First Int'l World Wide Web Conference, CERN, Geneva, May 1994.
24. A World Wide Web Resource Discovery System Budi Yuwon, Savio L. Y. Lam, Jerry H. Ying, Dik L. Lee Proceedings of the 4th World Wide Web Conference, 1998.
25. A Survey of
Information Retrieval and Filtering Methods Christos
Faloutos and Douglas Oard (
26. Guidelines for Robot Writers, Martijn Koster, 1993
27. Robots in the Web: threat or treat? Martijn Koster, NEXOR, April 1995, [1997: Updated links and addresses]; A Standard for Robot Exclusion Martijn Koster.
28. Authoritative Sources in a Hyperlinked Environment, J. Kleinberg. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997. ( IBM Clever Searching Project)
29. Results Ranking in Web Search Engines Martin P. Courtois and Michael W. Berry. ONLINE, May 1999 Copyright Online Inc.
30. Evaluation of Web
search engines and the search for better ranking algorithms. Mildrid Ljosland e-mail: Mildrid.Ljosland@idi.ntnu.no
31. S.
Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic
Resource Compilation by Analyzing Hyperlink Structure and Associated Text.
Proceedings of the 7th World-Wide Web conference, 1998. Copyright owned
by Elsevier Sciences,
32. D. Gibson, J. Kleinberg, and P. Raghavan. Inferring Web Communities from Link Topologies. Proceedings of The Ninth ACM Conference on Hypertext and Hypermedia, 1998. Copyright owned by ACM.
33. S. Chakrabarti, B. Dom, R. Agrawal, P. Raghavan. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB Journal, 1998 (invited).
34. S. Chakrabarti, B. Dom and P. Indyk. Enhanced hypertext categorization using hyperlinks. Proceedings of ACM SIGMOD 1998.
35. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. Hypersearching the web. Scientific American, June, 1999.
36. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. Mining the link structure of the World Wide Web IEEE Computer.
37. S.R.
Kumar, P. Raghavan, S. Rajagopalan,
and A. Tomkins. Trawling
the Web for emerging cyber-communities Eighth World Wide Web conference,
38. S.
Chakrabarti, M. Van den Berg, B. Dom Focused crawling: a new
approach to topic specific resource discovery Eighth World Wide Web
conference,
39. J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. The web as a graph: Measurements, models and methods. Proceedings of the International Conference on Combinatorics and Computing, 1999; invited paper.
40. S.R.
Kumar, P. Raghavan, S. Rajagopalan,
and A. Tomkins. Extracting large scale
knowledge bases from the web. IEEE International conference on Very
Large Databases (VLDB),
41. D. Gibson, J. Kleinberg and P. Raghavan. Clustering categorical data: an approach based on dynamical systems. Proceedings of the VLDB conference, 1998.
42. Search and Ranking Algorithms for Locating Resources on the World Wide Web B. Yuwono and D. Lee. IEEE conference on Data Engineering, 1996 (pp391-400).
43. A Machine Learning Architecture for Optimizing Web Search Engines, J. Boyan, D. Freitag, and T. Joachims. AAAI Workshop on Internet-based Information Systems, 1996.
44. SIBRIS:
the
45. Estimating the
Usefulness of Search Engines
46. The effectiveness of GlOSS for the Text Database Discovery Problem L. Gravano, H. Garcia-Molina, A. Tomasic. SIGMOD 1994. (GlOSS)
Questions: What are the key technology for building high performance and scalable Web Servers ?
47. Measuring the Capacity of a Web Server Gaurav
Banga and Peter Druschel,
Proceedings of the 1997 USENIX Symposium on Internet Technologies and Systems,
48. Internet Web Servers: Workload Characterization and Performance Implications Arlitt and Williamson, ACM/IEEE Transactions on Networking, 5(5):631-645, Oct. 1997. A short version titled "Web Server Workloa d Characterization: The Search for Invariants", appeared in ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1996.
49. Locating Nearby Copies of Replicated Internet Servers, James D. Guyton and Michael F. Schwartz. ACM SIGCOMM, 1995.
50. HACC: An Architecture for Cluster-Based Web Servers Xiaolan Zhang, Michael Barrientos,
J. Bradley Chen, Margo Seltzer (
51. Trace-Driven Simulation of Document Caching Strategies for Internet Web Servers Martin F. Arlitt and Carey L. Williamson, Simulation, Special Issue: Modeling and Simulation of Computer Systems and Networks. Vol. 68, No. 1, January, 1997.
52. Performance Characteristics of Mirror Servers on the Internet Andy Myers, Peter Dinda, Hui Zhang. INFOCOM'98.
53. TCP Behavior of a Busy Internet Server: Analysis and Improvements Hari Balakrishnan, Venkata Padmanabhan, Srini Seshan, Mark Stemm and Randy H. Katz, INFOCOM'98.
54. The Content and Access Dynamics of a Busy Web
Site: Findings and Implications V. N. Padmanabhan and L. Qiu.
Proceedings of ACM SIGCOMM 2000,
55. A Performance Monitoring and Capacity Planning Methodology for Web Servers Rodney B. Wallace and Tyrone E. McKoy, Jr. (NCR Corporation)
56. A Self-Scaling and Self-Configuring Benchmark for Web Servers Stephen Manley (Network Appliance), Michael Courage (Microsoft Co.), and Mar go Seltzer (Harvard)
57. Connection Scheduling in Web Servers M. E. Crovella, R. Frangioso, and M. Harchol-Balter, Proceedings of the 1999 USENIX Symposium on Internet Technologies.
58. Dynamic Server Selection in the Internet Mark
E. Crovella and Robert L. Carter, Computer Science
Department,
59. Dynamic Server Selection Using Bandwidth Probing in Wide-Area Networks, R. Carter and M. Crovella, INFOCOM, 1997. extended version (TR-96-007).
60. NCSA's World Wide Web Server: Design and Performance Tomas T. Kwan, Robert E. McGrath, and Daniel A. Reed, IEEE Computer, Vol. 28, No. 11, pp. 68-74, November 1995. An earlier version titled: User Access Patterns to NCSA's World Wide Web Server. (Recent Pablo Project)
61. A Scalable and Highly Available Web Server,
62. A Scalable HTTP Server: The NCSA Prototype, R. McGrath. Proc. of the 1st Intl. World-Wide Web Conference, May 1994. (HTML)
63. The Power of Two Choices in Randomized Load Balancing, M. Mitzenmacher, PhD. Thesis, 1996.
64. Flash: An Efficient and Portable Web Server Vivek Pai, Peter Druschel and Willy Zwaenepoel,
Proceedings of 1999 USENIX Conference,
65. The
AFS File System in Distributed Computing Environments:White
Paper, Mnsarc Corporation, May 1996.
66. Apache Server (Apache HTTP Server V1.3 - API notes)
Questions: How
do we coordinate the activities of a geographically distributed
application?
67.
68. A Hierarchical Internet Object Cache Anawat Chankhunthod,
Peter B. Danzig, Chuck Neerdaels
(University of Southern California), Michael F. Schwartz, Kurt J. Worrell
(University of Colorado, Boulder) (An Implementation of the hierarchical Object
Cache at netapp)
70. Main Memory Caching of Web Documents, vangelos P. Markatos. In Proceedings of the Fifth WWW Conference, 1996.
71. Design Considerations for Integrated Proxy Servers S. Sahu, P. Shenoy, D. Towsley. y Proc. IEEE NOSSDAV'99 (Basking Ridge, NJ, June 1999).
72. Performance Issues of Enterprise Level Web
Proxies C. Maltzahn
and K. Richardson and D. Grunwald, Proceedings of ACM
SIGMETRICS'97, Seattle, WA, Pages 13-23, June 1997.
73. World Wide Web Cache Consistency, James Gwertzman and Margo Seltzer, Proceedings
of the 1996 USENIX Technical Conference, San Diego, CA, Jan 1996.
74. Managing
Update Conflicts in Bayou, a Weakly Connected Replicated Storage System D. Terry, M. Theimer, K.
Petersen, A. Demers and M. Spreitzer, and C. Hauser,
In Proceedings of the fifteenth ACM Symposium on Operating Systems Principles
(SOSP'97), Copper Mountain Resort, CO, December, 1995.
75. Volume Leases for Consistency in Large-scale Systems, J. Yin, L. Alvisi, M. Dahlin and C. Lin, IEEE Transactions on Knowledge and Data
Engineering Special issue on Web Technologies, Jan 1999 .
76. Web proxy caching: the devil is in the details Ramon Caceres, Fred Douglis, Anja Feldmann,
Gideon Glass, Michael Rabinovich, Workshop on
Internet Server Performance held with Sigmetrics'98.
78. Intelligent Caching for World-Wide Web Objects
Duane Wessels (
79. Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol
L. Fan, P. Cao, J. Almeida, and A.Z. Broder, SIGCOMM, 1998, pp 254-265.
80. Improving
End-to-End Performance of the Web Using Server Volumes and Proxy Filters Edith Cohen, Balachander
Krishnamurthy, Jennifer Rexford, SIGCOMM, 1998.
81. Internet
Cache Protocol (ICP), version 2 D. Wessels, K. Claffy, RFC 2186, May
1997. (ICP Working Group Home Page )
82. Adaptive
Web Caching S. Floyd, V. Jacobson and L. Zhang, Procedings of the Web Caching Workshop 1997.
83. An
Analysis of Geographical Push Caching J. Gwertzman and M. Seltzer.
84. A Caching Relay for the World Wide Web Steven Glassman, First International World-Wide
Web Conference, pp 69-76, May 1994.
86. Intelligent Caching for World-Wide Web Objects Duane Wessels (University
of Colorado), Proceedings of INET'95, May 1995.
88. Operating System Support for High-Speed
Networking, P. Druschel,
Communications of the ACM, Vol. 39, No. 9, Pages 41-51, September 1996.
89. Squid
Internet Object Cache; Squid Web
Proxy Cache
90. Performance of Web Proxy Caches, Feldmann, Caceres, Douglis, Glass, and Robinovitch, Workshop on Internet Server Performance
(WISP), 1998.
91. Enhancing the Web's Infrastructure: From Caching
to Replication Michael Baentsch, Lothar Baum, Georg Molter, Steffen Rothkugel, and Peter Sturm, IEEE Internet Computing,
vol. 1, no. 2, pages 18-27, April 1997 (class handout).
92. Propagation,
Replication and Caching from the W3C
93. Caching Proxies: Limitations and Potentials Marc Abrams, Charles R. Standridge,
Ghaleb Abdulla, Stephen
Williams, Edward A. Fox, Proceedings of the Fourth International World Wide Web
Conference, pages 119-133, Boston, MA, December 1995.
94. Improving End to End Performance of the Web using Server Volumes and
Proxy Filters, E. Cohen, B.
Krishnamurthy and J. Rexford, In Proceedings of ACM SIGCOMM'98, Vancouver,
Canada, Pages 241-253, September 1998.
95. The Measured Access Characteristics of World-Wide-Web Client Proxy
Caches, B M. Duska,
D. Marwood, and M J. Feeley,
In Proceedings of the USENIX Symposium on Internet Technologies and Systems,
Monterey, CA, December, 1997
96. A Survey of Proxy Cache Evaluation Techniques Brian D. Davison
97. A Tutorial for Network Caching
Question: How much can Prefetching alleviate the
latency and bandwidth problems in Web access?
98. Using Predictive Prefetching to Improve World Wide Web Latency, Padmanabhan and Mogul, SIGCOMM, 1996.
99. Alleviating the Latency and Bandwidth Problems in WWW Browsing, Loon and Bharghavan, Usenix Symposium on Internet Technologies and Systems (USITS) 1997.
100. Determining WWW User's Next Access and Its Application to Pre-fetching, Carlos R. Cunha and Carlos F.B. Jaccoud, Proceedings of ISCC'97: The Second IEEE Symposium on Computers and Communications. Alexandria, Egypt, 1-3 July 1997.
101.
Potential and Limits of Web Prefetching Between Low-Bandwidth
Clients and Proxies Li Fan, Quinn Jacobson and
102.
Optimal Prefetching via Data Compression,
Vitter and Krishnan, FOCS, 1991.
103. The Network Effects of Prefetching, Crovella and Barford, INFOCOM, 1998.
105. Web Facts and Fantasy, Stephen Manley (Network Appliance), Margo Seltzer (Harvard University), Proceedings of the 1997 USENIX Symposium on Internet Technologies and Systems, Monterey, CA, December 1997.
106. Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes, Mark E. Crovella and Azer Bestavros IEEE/ACM Transactions on Networking, 5(6):835--846, December 1997.
107. Measuring Web Performance in the Wide Area, P. Barford and M. E. Crovella, in Performance Evaluation Review, August, 1999.
108. Changes in Web Client Access Patterns: Characteristics and Caching Implications, P. Barford, A. Bestavros, A. Bradley, and M. E. Crovella, in World Wide Web, Special Issue on Characterization and Performance Evaluation, Vol. 2, pp. 15-28, 1999.
109. Characterizing Browsing Strategies in the World-Wide Web, L. Catledge and J. Pitkow, Journal of Computer Networks and ISDN Systems, vol. 27, no. 6, 1995, p. 1065.
110. Measuring the Web, Tim Bray (Open Text
Corporation), Fifth International World Wide Web Conference, May 1996,
111. Generating Representative Web Workloads for Network and Server Performance Evaluation, Barford and Crovella, SIGMETRICS, 1998, pp. 151-160.
112. Characterizing Reference Locality in the WWW, Almeida, Bestavros, Crovella, and de Oliveira, International Conference on Parallel and Distributed Information Systems (ICPDIS), 1996. (The OCEANS Project)
113. Web Traffic Characterization: An Assessment of the Impact of Caching
Documents from NCSA's Web Server H. Braun and K. Claffy;
Second International Conference on the WWW,