CS8803 Advanced Internet Application Development

Instructor: Professor Ling Liu

Course Readings

Attention: The information contained in this page is subject to changes.


| Requirement | Required Readings | General/Recommended Readings | Reading Summary Posting |


Reading Summary Requirement

There will be several background readings assigned each week. The readings will either be handed out a week before or listed on the Web page for required readings. You may also access this information from course schedule.

Homework/Assignment:
You are expected to read the material each week and write 2-3 paragraphs per reading giving your impressions and thoughts. The summaries should be informal and brief, and should consist of your own comments on the readings, NOT a rehash of the content.

You should email your summaries to TA: Aameek Singh AND Mudhakar Srivatsa ({aameek, mudhakar}@cc.gatech.edu), preferably before each class but no later than 6:00pm of Friday each week (unless there is no reading assignments for the week). Late assignment will NOT be accepted unless approved in advance by the instructor.

Reading Summary Guidelines:
The summary for each reading assignment is expected to consist of 1 paragraph on each of the following three aspects: (1) the positive aspect of the paper; (2) the negative aspect of the paper; and (3) a brief discussion on how the idea or method proposed or used in evaluation may be applied to your own project for the course.

You may want to keep these guidelines in mind when reading papers.

v     Problem Statement 

¨      What is the problem area with which the paper is concerned? What are the concrete problems that the authors are trying to solve?

v     Contributions/New Ideas 

¨      Summarize the authors' arguments. What the authors are proposing, new architecture, algorithm, methodology? Are you convinced? Why or Why not?

v     Evaluation 

¨      How did authors evaluate their new proposals? Did they build a system? run a simulation, collect traces from existing systems? or prove theorems? How their data collection was done? Do you agree with their conclusion? their analysis?

v     Weakness 

¨      Comparing with the state of art research in the probem area or according to the related work section in the paper, was the idea proposed new? Was the approach novel? What, in your opinion, should be evaluated to validate their new proposal, but ar e missing in their evaluation? Is there any alternative ways to conduct evaluation?

You may find the following short article helpful:

Efficient Reading of Papers in Science and Technology By Michael J. Hanson and updated by D. McNamee
 

Areas of Readings

·        Search Engine Issues

·        Web Servers

o       Web Servers Issues

o       Web Proxies and Web Caching

o       Web Prefetching

o       WWW Workloads

·        Application Server Issues

·        System Level Issues

o       System Support for Internet Applications

o       Naming Issues

o       Security for Internet Applications

·        Additional Reading

o       Peer to Peer Computing

o       Mobile Computing

o       Collaborative Filtering

o       Sensor, Stream, and Continual Query

Required Readings and Dates

You are expected to read papers in the required reading list, but only write summary for one paper selected from the list of 2-3 required readings associated with each lecture. Please use the Summary Template to write the reading summaries. 

General/Recommended Course Reading List

Search Engine Issues

Questions: How to build a search engine that scales up as the Web grows? 

1.      Google: The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page, Proceedings of the 7th International World Wide Web Conference, pages 107-117, April 1998

2.      Inktomi: An Investigation of Documents from the World Wide Web Allison Woodruff, Paul M. Aoki, Eric Brewer, Paul Gauthier, and Lawrence A. Rowe  

3.      Harvest: Scalable Internet Resource Discovery: Research Problems and Approaches C. Mic Bowman (Tranarc Corp.), Peter Danzig (Univ. Southern California), Udi Manber (Univ. of Arizona), and Michael Schwartz (Univ. Colorado), Appeared in CACM 1994 (Download Harvest Indexer) (Harvest Papers)

4.      Harvest: The Harvest Information Discovery and Access System C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber and Michael F. Schwartz, Computer Networks and ISDN Systems, 28 (1995) pp. 119-125

5.      Harvest: A Scalable,Customizable Discovery and Access System C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber, Michael F. Schwartz, and Duane P. Wessels, Technical Report CU-CS-732-94, Department of Computer Science, University of Colorado,Boulder, August 1994 (revised March 1995).

6.      Customized Information Extraction as a Basis for Resource Discovery Darren R. Hardy and Michael F. Schwartz, ACM Transactions on Computer Systems.

7.      Indie: Distributed Indexing of autonomous Internet Services Peter Danzig, Shih-Hao Li, Katia Obraczka. Journam of Computer Systems, 5(4), 1992. Original description of Indie in 1991 ACM SIGIR

8.      Internet resource discovery services Katia Obraczka, Peter Danzig, and Shih-Hao Li, IEEE Computer, Sept. 1993.

9.      Research Problems for Scalable Internet Resource Discovery C. Mic Bowman, Peter B. Danzig, and Michael F. Schwartz, 1993 IEEE Computer.

10.  GLIMPSE: A Tool to Search Through Entire File Systems Udi Manber and Sun Wu (Univ. of Arizona), Technical Report TR 93-34, Department of Computer Science, University of Arizona, October, 1993. (Glimpse Home Page)

11.  WebGlimpse--Combining Browsing and Searching Udi Manber, Mike Smith, and Burra Gopal (Univ. of Arizona), to appear in the Proceedings of the 1997 Usenix Technical Conference, January 1997.(WebGlimpse Home Page) (WebGLIMPSE Publications)

12.  Mercator: A Scalable, Extensible Web Crawler Allan Heydon and Marc Najork, Compaq Systems Research Center  (Mercator Project) (html)

13.  A technique for measuring the relative size and overlap of public Web search engines Krishna Bharat and Andrei Broder (DIGITAL, Systems Research Center).

14.  The Connectivity Server: fast access to linkage information on the Web Krishna Bharata, Andrei Brodera, Monika Henzingera, Puneet Kumara, and Suresh Venkatasubramanian, Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, pages 469-477. Elsevier Science, April 1998.

15.  Efficient Crawling through URL Ordering Junghoo Cho, Hector Garcia-Molina, and Lawrence Page, Proceedings of the 7thInternational World Wide Web Conference, pages 161-172, April 1998

16.  Crawling towards Eternity: Building an Archive of the World Wide Web Mike Burner, Web Techniques Magazine, 2(5), May 1997

17.  The Truth about the Web: Crawling towards Eternity Z. Smith, Web Techniques Magazine, 2(5), May 1997

18.  Measuring Index Quality using Random Walks on the Web Monika Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc A. Najork, Proceedings of the 8th International World Wide Web Conference, pages 213-225, May 1999

19.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery Soumen Chakrabarti, Martin van den Berg, Byron Dom, Proceedings of the 8thInternational World Wide Web Conference, May 1999

20.  Finding What People Want: Experiences with the WebCrawler Brian Pinkerton, Proceedings of the 8th International World Wide Web Conference, 1994

21.  SPHINX: A Framework for Creating Personal, Site-specific Web Creawlers Robert C. Miller and Krishna Bharat, Proceedings of the 7th International World Wide Web Conference, pages 119-130, April 1998

22.  Information Retrieval on the World Wide Web Venkat N. Gudivada, Vijay V. Raghavan, William I. Grosky, and Rajesh Kasangottu, IEEE Internet Computing, vol. 1, number 5, September/October, 1997.

23.  GENVL and WWW: Tools for Taming the Web Oliver McBryan, Proceedings of the First Int'l World Wide Web Conference, CERN, Geneva, May 1994.

24.  A World Wide Web Resource Discovery System Budi Yuwon, Savio L. Y. Lam, Jerry H. Ying, Dik L. Lee Proceedings of the 4th World Wide Web Conference, 1998.

25.  A Survey of Information Retrieval and Filtering Methods Christos Faloutos and Douglas Oard (Univ. of Maryland).

26.  Guidelines for Robot Writers, Martijn Koster, 1993

27.  Robots in the Web: threat or treat? Martijn Koster, NEXOR, April 1995, [1997: Updated links and addresses];  A Standard for Robot Exclusion Martijn Koster.

28.  Authoritative Sources in a Hyperlinked Environment, J. Kleinberg. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997. ( IBM Clever Searching Project)

29.  Results Ranking in Web Search Engines Martin P. Courtois and Michael W. Berry. ONLINE, May 1999 Copyright Online Inc.

30.  Evaluation of Web search engines and the search for better ranking algorithms. Mildrid Ljosland e-mail: Mildrid.Ljosland@idi.ntnu.no Norwegian University of Science and Technology. the SIGIR99 Workshop on Evaluation of Web Retrieval, August 19, 1999

31.  S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. Proceedings of the 7th World-Wide Web conference, 1998. Copyright owned by Elsevier Sciences, Amsterdam.

32.  D. Gibson, J. Kleinberg, and P. Raghavan. Inferring Web Communities from Link Topologies. Proceedings of The Ninth ACM Conference on Hypertext and Hypermedia, 1998. Copyright owned by ACM.

33.  S. Chakrabarti, B. Dom, R. Agrawal, P. Raghavan. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB Journal, 1998 (invited).

34.  S. Chakrabarti, B. Dom and P. Indyk. Enhanced hypertext categorization using hyperlinks. Proceedings of ACM SIGMOD 1998.

35.  S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. Hypersearching the web. Scientific American, June, 1999.

36.  S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. Mining the link structure of the World Wide Web IEEE Computer.

37.  S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the Web for emerging cyber-communities Eighth World Wide Web conference, Toronto, Canada, May 1999.

38.  S. Chakrabarti, M. Van den Berg, B. Dom Focused crawling: a new approach to topic specific resource discovery Eighth World Wide Web conference, Toronto, 1999.

39.  J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. The web as a graph: Measurements, models and methods. Proceedings of the International Conference on Combinatorics and Computing, 1999; invited paper.

40.  S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Extracting large scale knowledge bases from the web. IEEE International conference on Very Large Databases (VLDB), Edinburgh, Scotland.

41.  D. Gibson, J. Kleinberg and P. Raghavan. Clustering categorical data: an approach based on dynamical systems. Proceedings of the VLDB conference, 1998.

42.  Search and Ranking Algorithms for Locating Resources on the World Wide Web B. Yuwono and D. Lee. IEEE conference on Data Engineering, 1996 (pp391-400).

43.  A Machine Learning Architecture for Optimizing Web Search Engines,  J. Boyan, D. Freitag, and T. Joachims. AAAI Workshop on Internet-based Information Systems, 1996.

44.  SIBRIS: the Sandwich Interactive Browsing and Ranking Information System S. Wade, P. Willett, and D. Bawden. Journal of Information Science, 15, 1989, pp249-260

45.  Estimating the Usefulness of Search Engines W. Meng, K. Liu, C. Yu, W. Wu and N. Rishe. ICDE 1999. (more details)

46.  The effectiveness of GlOSS for the Text Database Discovery Problem L. Gravano, H. Garcia-Molina, A. Tomasic. SIGMOD 1994. (GlOSS)

 

 

 

 

 

 

 

 

 

 

    Web Server Issues

      Questions: What are the key technology for building high performance and scalable Web Servers ? 

47.  Measuring the Capacity of a Web Server Gaurav Banga and Peter Druschel, Proceedings of the 1997 USENIX Symposium on Internet Technologies and Systems, Monterey, CA, December 1997. 

48.  Internet Web Servers: Workload Characterization and Performance Implications Arlitt and Williamson, ACM/IEEE Transactions on Networking, 5(5):631-645, Oct. 1997. A short version titled "Web Server Workloa d Characterization: The Search for Invariants", appeared in ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1996.

49.  Locating Nearby Copies of Replicated Internet Servers, James D. Guyton and Michael F. Schwartz. ACM SIGCOMM, 1995. 

50.  HACC: An Architecture for Cluster-Based Web Servers Xiaolan Zhang, Michael Barrientos, J. Bradley Chen, Margo Seltzer (Harvard University). In the Proceedings of the 3rd USENIX Windows NT Symposium, July 1999, Seattle, WA, 155-164.

51.  Trace-Driven Simulation of Document Caching Strategies for Internet Web Servers Martin F. Arlitt and Carey L. Williamson, Simulation, Special Issue: Modeling and Simulation of Computer Systems and Networks. Vol. 68, No. 1, January, 1997. 

52.  Performance Characteristics of Mirror Servers on the Internet Andy Myers, Peter Dinda, Hui Zhang. INFOCOM'98.

53.  TCP Behavior of a Busy Internet Server: Analysis and Improvements Hari Balakrishnan, Venkata Padmanabhan, Srini Seshan, Mark Stemm and Randy H. Katz, INFOCOM'98.

54.  The Content and Access Dynamics of a Busy Web Site: Findings and Implications V. N. Padmanabhan and L. Qiu. Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden, August 2000. (PDF). (An earlier version appeared as Microsoft Research Technical Report MSR-TR-2000-13, February 2000 (PostScript, PDF))

55.  A Performance Monitoring and Capacity Planning Methodology for Web Servers Rodney B. Wallace and Tyrone E. McKoy, Jr. (NCR Corporation)

56.  A Self-Scaling and Self-Configuring Benchmark for Web Servers Stephen Manley (Network Appliance), Michael Courage (Microsoft Co.), and Mar go Seltzer (Harvard)

57.  Connection Scheduling in Web Servers M. E. Crovella, R. Frangioso, and M. Harchol-Balter, Proceedings of the 1999 USENIX Symposium on Internet Technologies.

58.  Dynamic Server Selection in the Internet Mark E. Crovella and Robert L. Carter, Computer Science Department, Boston University.

59.  Dynamic Server Selection Using Bandwidth Probing in Wide-Area Networks, R. Carter and M. Crovella, INFOCOM, 1997. extended version (TR-96-007).

60.  NCSA's World Wide Web Server: Design and Performance Tomas T. Kwan, Robert E. McGrath, and Daniel A. Reed, IEEE Computer, Vol. 28, No. 11, pp. 68-74, November 1995. An earlier version titled: User Access Patterns to NCSA's World Wide Web Server. (Recent Pablo Project)

61.  A Scalable and Highly Available Web Server, D.M. Dias, W. Kish, R. Mukherjee, R. Tewari. In Proceedings of the IEEE Computer Conference (COMPCON), Santa Clara, March, 1996.

62.  A Scalable HTTP Server: The NCSA Prototype, R. McGrath. Proc. of the 1st Intl. World-Wide Web Conference, May 1994. (HTML)

63.  The Power of Two Choices in Randomized Load Balancing, M. Mitzenmacher, PhD. Thesis, 1996.

64.  Flash: An Efficient and Portable Web Server Vivek Pai, Peter Druschel and Willy Zwaenepoel, Proceedings of 1999 USENIX Conference, Monterey, CA, June 1999.

65.  The AFS File System in Distributed Computing Environments:White Paper, Mnsarc Corporation, May 1996.

66.  Apache Server (Apache HTTP Server V1.3 - API notes)

 

 

 

 

 

 

 

 

 

 

Web Proxies and Web Caching

Questions: How do we coordinate the activities of a geographically distributed application? 

67.  World Wide Web Proxies Ari Luotonen (CERN) and Kevin Altis (Intel) (html)

68.  A Hierarchical Internet Object Cache Anawat Chankhunthod, Peter B. Danzig, Chuck Neerdaels (University of Southern California), Michael F. Schwartz, Kurt J. Worrell (University of Colorado, Boulder) (An Implementation of the hierarchical Object Cache at netapp)

69.  Beyond Hierarchies: Design Principles for Distributed Caching on the Internet, R. Tewari, M. Dahlin, H. Vin, and J. Kay, Technical report TR98-04, Dept. of Computer Sciences, Univ. of Texas, 1998.

70.  Main Memory Caching of Web Documents, vangelos P. Markatos. In Proceedings of the Fifth WWW Conference, 1996. 

71.  Design Considerations for Integrated Proxy Servers S. Sahu, P. Shenoy, D. Towsley. y Proc. IEEE NOSSDAV'99 (Basking Ridge, NJ, June 1999).

72.  Performance Issues of Enterprise Level Web Proxies C. Maltzahn and K. Richardson and D. Grunwald, Proceedings of ACM SIGMETRICS'97, Seattle, WA, Pages 13-23, June 1997.

73.  World Wide Web Cache Consistency, James Gwertzman and Margo Seltzer, Proceedings of the 1996 USENIX Technical Conference, San Diego, CA, Jan 1996.

74.  Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System D. Terry, M. Theimer, K. Petersen, A. Demers and M. Spreitzer, and C. Hauser, In Proceedings of the fifteenth ACM Symposium on Operating Systems Principles (SOSP'97), Copper Mountain Resort, CO, December, 1995.

75.  Volume Leases for Consistency in Large-scale Systems, J. Yin, L. Alvisi, M. Dahlin and C. Lin, IEEE Transactions on Knowledge and Data Engineering Special issue on Web Technologies, Jan 1999 .

76.  Web proxy caching: the devil is in the details Ramon Caceres, Fred Douglis, Anja Feldmann, Gideon Glass, Michael Rabinovich, Workshop on Internet Server Performance held with Sigmetrics'98.

77.  Making World Wide Web Caching Servers Cooperate Radhika Malpani, Jacob Lorch, David Berger. Proceedings of the 4th WWW, 1998.

78.  Intelligent Caching for World-Wide Web Objects Duane Wessels (University of Colorado), Proceedings of INET'95, May 1995.

79.  Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol L. Fan, P. Cao, J. Almeida, and A.Z. Broder, SIGCOMM, 1998, pp 254-265.

80.  Improving End-to-End Performance of the Web Using Server Volumes and Proxy Filters Edith Cohen, Balachander Krishnamurthy, Jennifer Rexford, SIGCOMM, 1998.

81.  Internet Cache Protocol (ICP), version 2 D. Wessels, K. Claffy, RFC 2186, May 1997. (ICP Working Group Home Page )

82.  Adaptive Web Caching S. Floyd, V. Jacobson and L. Zhang, Procedings of the Web Caching Workshop 1997.

83.  An Analysis of Geographical Push Caching J. Gwertzman and M. Seltzer.

84.  A Caching Relay for the World Wide Web Steven Glassman, First International World-Wide Web Conference, pp 69-76, May 1994.

85.  Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes Mark Crovella and Azer Bestavros, Proceedings of SIGMETRICS '96.

86.  Intelligent Caching for World-Wide Web Objects Duane Wessels (University of Colorado), Proceedings of INET'95, May 1995.

87.  The Rio File Cache: Surviving Operating System Crashes Peter M. Chen, Wee Teck Ng, Subhachandra Chandra, Christopher Aycock, Gurushankar Rajamani, and David Lowell, Proceedings of the 1996 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), October 1996.

88.  Operating System Support for High-Speed Networking, P. Druschel, Communications of the ACM, Vol. 39, No. 9, Pages 41-51, September 1996.

89.  Squid Internet Object Cache; Squid Web Proxy Cache

90.  Performance of Web Proxy Caches, Feldmann, Caceres, Douglis, Glass, and Robinovitch, Workshop on Internet Server Performance (WISP), 1998.

91.  Enhancing the Web's Infrastructure: From Caching to Replication Michael Baentsch, Lothar Baum, Georg Molter, Steffen Rothkugel, and Peter Sturm, IEEE Internet Computing, vol. 1, no. 2, pages 18-27, April 1997 (class handout).

92.  Propagation, Replication and Caching from the W3C

93.  Caching Proxies: Limitations and Potentials Marc Abrams, Charles R. Standridge, Ghaleb Abdulla, Stephen Williams, Edward A. Fox, Proceedings of the Fourth International World Wide Web Conference, pages 119-133, Boston, MA, December 1995.

94.  Improving End to End Performance of the Web using Server Volumes and Proxy Filters, E. Cohen, B. Krishnamurthy and J. Rexford, In Proceedings of ACM SIGCOMM'98, Vancouver, Canada, Pages 241-253, September 1998.

95.  The Measured Access Characteristics of World-Wide-Web Client Proxy Caches, B M. Duska, D. Marwood, and M J. Feeley, In Proceedings of the USENIX Symposium on Internet Technologies and Systems, Monterey, CA, December, 1997

96.  A Survey of Proxy Cache Evaluation Techniques Brian D. Davison

97.  A Tutorial for Network Caching

Web Prefetching

Question: How much can Prefetching alleviate the latency and bandwidth problems in Web access?

98.  Using Predictive Prefetching to Improve World Wide Web Latency,  Padmanabhan and Mogul, SIGCOMM, 1996.

99.  Alleviating the Latency and Bandwidth Problems in WWW Browsing, Loon and Bharghavan, Usenix Symposium on Internet Technologies and Systems (USITS) 1997.

100.    Determining WWW User's Next Access and Its Application to Pre-fetching, Carlos R. Cunha and Carlos F.B. Jaccoud, Proceedings of ISCC'97: The Second IEEE Symposium on Computers and Communications. Alexandria, Egypt, 1-3 July 1997.

101.    Potential and Limits of Web Prefetching Between Low-Bandwidth Clients and Proxies Li Fan, Quinn Jacobson and Pei Cao. To appear in SIGMETRICS'99.

102.    Optimal Prefetching via Data Compression, Vitter and Krishnan, FOCS, 1991.

103.    The Network Effects of Prefetching, Crovella and Barford, INFOCOM, 1998.

 

WWW Workloads

104.    Questions: What are the main causes of World Wide Web traffic? How do we find idle resources if they exist? 

105.    Web Facts and Fantasy, Stephen Manley (Network Appliance), Margo Seltzer (Harvard University), Proceedings of the 1997 USENIX Symposium on Internet Technologies and Systems, Monterey, CA, December 1997.

106.    Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes, Mark E. Crovella and Azer Bestavros IEEE/ACM Transactions on Networking, 5(6):835--846, December 1997.

107.    Measuring Web Performance in the Wide Area, P. Barford and M. E. Crovella, in Performance Evaluation Review, August, 1999.

108.    Changes in Web Client Access Patterns: Characteristics and Caching Implications, P. Barford, A. Bestavros, A. Bradley, and M. E. Crovella, in World Wide Web, Special Issue on Characterization and Performance Evaluation, Vol. 2, pp. 15-28, 1999.

109.    Characterizing Browsing Strategies in the World-Wide Web, L. Catledge and J. Pitkow, Journal of Computer Networks and ISDN Systems, vol. 27, no. 6, 1995, p. 1065.

110.    Measuring the Web, Tim Bray (Open Text Corporation), Fifth International World Wide Web Conference, May 1996, Paris, France.

111.    Generating Representative Web Workloads for Network and Server Performance Evaluation, Barford and Crovella, SIGMETRICS, 1998, pp. 151-160. 

112.    Characterizing Reference Locality in the WWW, Almeida, Bestavros, Crovella, and de Oliveira, International Conference on Parallel and Distributed Information Systems (ICPDIS), 1996. (The OCEANS Project)

113.    Web Traffic Characterization: An Assessment of the Impact of Caching Documents from NCSA's Web Server H. Braun and K. Claffy; Second International Conference on the WWW,