You are expected to read the material each week and write 2-3 paragraphs per reading giving your impressions and thoughts. The summaries should be informal and brief, and should consist of your own comments on the readings, NOT a rehash of the content.

You should submit your summaries on T-Square. Late assignment will NOT be accepted unless approved in advance by the instructor.

Reading Summary Guidelines:

The summary for each reading assignment is expected to consist of 1 paragraph on each of the following three aspects: (1) the positive aspect of the paper; (2) the negative aspect of the paper; and (3) a brief discussion on how the idea or method proposed or used in evaluation may be applied to your own project for the course.

General/Recommended Course Reading List

Search Engine Technology (top^)

Questions: How to build a search engine that scales up as the Web grows?

1. Google, The Anatomy of a Large-scale Hypertextual Web Search Engine. Sergey Brin and Lawrence Page. In 7th Int. Conf. WWW, Brisbane, Australia, April 1998.

2. Inktomi: An Investigation of Documents from the World Wide Web Allison Woodruff, Paul M. Aoki, Eric Brewer, Paul Gauthier, and Lawrence A. Rowe

3. Harvest: Scalable Internet Resource Discovery: Research Problems and Approaches C. Mic Bowman (Tranarc Corp.), Peter Danzig (Univ. Southern California), Udi Manber (Univ. of Arizona), and Michael Schwartz (Univ. Colorado), Appeared in CACM 1994 (Download Harvest Indexer) (Harvest Papers)

4. Harvest: The Harvest Information Discovery and Access System C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber and Michael F. Schwartz, Computer Networks and ISDN Systems, 28 (1995) pp. 119-125

5. Harvest: A Scalable,Customizable Discovery and Access System C. Mic Bowman, Peter B. Danzig, Darren R. Hardy, Udi Manber, Michael F. Schwartz, and Duane P. Wessels, Technical Report CU-CS-732-94, Department of Computer Science, University of Colorado,Boulder, August 1994 (revised March 1995).

6. Customized Information Extraction as a Basis for Resource Discovery Darren R. Hardy and Michael F. Schwartz, ACM Transactions on Computer Systems.

7. Indie: Distributed Indexing of autonomous Internet Services Peter Danzig, Shih-Hao Li, Katia Obraczka. Journam of Computer Systems, 5(4), 1992. Original description of Indie in 1991 ACM SIGIR

8. Internet resource discovery services Katia Obraczka, Peter Danzig, and Shih-Hao Li, IEEE Computer, Sept. 1993.

9. Research Problems for Scalable Internet Resource Discovery , C. Mic Bowman, Peter B. Danzig, and Michael F. Schwartz, 1993 IEEE Computer.

10. GLIMPSE: A Tool to Search Through Entire File Systems Udi Manber and Sun Wu (Univ. of Arizona), Technical Report TR 93-34, Department of Computer Science, University of Arizona, October, 1993. (Glimpse Home Page)

11. WebGlimpse--Combining Browsing and Searching Udi Manber, Mike Smith, and Burra Gopal (Univ. of Arizona), to appear in the Proceedings of the 1997 Usenix Technical Conference, January 1997.(WebGlimpse Home Page) (WebGLIMPSE Publications)

12. Mercator: A Scalable, Extensible Web Crawler Allan Heydon and Marc Najork, Compaq Systems Research Center (Mercator Project) (html)

13. A technique for measuring the relative size and overlap of public Web search engines Krishna Bharat and Andrei Broder (DIGITAL, Systems Research Center).

14. The Connectivity Server: fast access to linkage information on the Web Krishna Bharata, Andrei Brodera, Monika Henzingera, Puneet Kumara, and Suresh Venkatasubramanian, Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, pages 469-477. Elsevier Science, April 1998.

15. Efficient Crawling through URL Ordering Junghoo Cho, Hector Garcia-Molina, and Lawrence Page, Proceedings of the 7^thInternational World Wide Web Conference, pages 161-172, April 1998

16. Crawling towards Eternity: Building an Archive of the World Wide Web Mike Burner, Web Techniques Magazine, 2(5), May 1997

17. The Truth about the Web: Crawling towards Eternity Z. Smith, Web Techniques Magazine, 2(5), May 1997

18. Measuring Index Quality using Random Walks on the Web Monika Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc A. Najork, Proceedings of the 8^th International World Wide Web Conference, pages 213-225, May 1999

19. Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery Soumen Chakrabarti, Martin van den Berg, Byron Dom, Proceedings of the 8^thInternational World Wide Web Conference, May 1999

20. Finding What People Want: Experiences with the WebCrawler Brian Pinkerton, Proceedings of the 8^th International World Wide Web Conference, 1994

21. SPHINX: A Framework for Creating Personal, Site-specific Web Creawlers Robert C. Miller and Krishna Bharat, Proceedings of the 7^th International World Wide Web Conference, pages 119-130, April 1998

22. Information Retrieval on the World Wide Web Venkat N. Gudivada, Vijay V. Raghavan, William I. Grosky, and Rajesh Kasangottu, IEEE Internet Computing, vol. 1, number 5, September/October, 1997.

23. GENVL and WWW: Tools for Taming the Web, Oliver McBryan, Proceedings of the First Int'l World Wide Web Conference, CERN, Geneva, May 1994.

24. A World Wide Web Resource Discovery System Budi Yuwon, Savio L. Y. Lam, Jerry H. Ying, Dik L. Lee Proceedings of the 4th World Wide Web Conference, 1998.

25. A Survey of Information Retrieval and Filtering Methods Christos Faloutos and Douglas Oard (Univ. of Maryland).

26. Guidelines for Robot Writers, Martijn Koster, 1993

27. Robots in the Web: threat or treat? Martijn Koster, NEXOR, April 1995, [1997: Updated links and addresses]; A Standard for Robot Exclusion Martijn Koster.

28. Authoritative Sources in a Hyperlinked Environment, J. Kleinberg. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Extended version in Journal of the ACM 46(1999). Also appears as IBM Research Report RJ 10076, May 1997. ( IBM Clever Searching Project)

29. How Search Engines Rank Web Pages Danny Courtois and Sullivan.

30. Evaluation of Web search engines and the search for better ranking algorithms. Mildrid Ljosland e-mail: Mildrid.Ljosland@idi.ntnu.no Norwegian University of Science and Technology. the SIGIR99 Workshop on Evaluation of Web Retrieval, August 19, 1999

31. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Proceedings of the 7th World-Wide Web conference, 1998. Copyright owned by Elsevier Sciences, Amsterdam.

32. Inferring Web Communities from Link Topologies. D. Gibson, J. Kleinberg, and P. Raghavan. Proceedings of The Ninth ACM Conference on Hypertext and Hypermedia, 1998. Copyright owned by ACM.

33. Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. S. Chakrabarti, B. Dom, R. Agrawal, P. Raghavan. VLDB Journal, 1998 (invited).

34. Enhanced hypertext categorization using hyperlinks.S. Chakrabarti, B. Dom and P. Indyk. Proceedings of ACM SIGMOD 1998.

35. Hypersearching the web. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. Scientific American, June, 1999.

36. Mining the link structure of the World Wide Web. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. IEEE Computer.

37. Trawling the Web for emerging cyber-communities. S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Eighth World Wide Web conference, Toronto, Canada, May 1999.

38. Focused crawling: a new approach to topic specific resource discovery. S. Chakrabarti, M. Van den Berg, B. Dom Eighth World Wide Web conference, Toronto, 1999.

39. The web as a graph: Measurements, models and methods. J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Proceedings of the International Conference on Combinatorics and Computing, 1999; invited paper.

40. Extracting large scale knowledge bases from the web. S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. IEEE International conference on Very Large Databases (VLDB), Edinburgh, Scotland.

41. Clustering categorical data: an approach based on dynamical systems. D. Gibson, J. Kleinberg and P. Raghavan. Proceedings of the VLDB conference, 1998.

42. Search and Ranking Algorithms for Locating Resources on the World Wide Web B. Yuwono and D. Lee. IEEE conference on Data Engineering, 1996 (pp391-400).

43. A Machine Learning Architecture for Optimizing Web Search Engines, J. Boyan, D. Freitag, and T. Joachims. AAAI Workshop on Internet-based Information Systems, 1996.

44. SIBRIS: the Sandwich Interactive Browsing and Ranking Information System S. Wade, P. Willett, and D. Bawden. Journal of Information Science, 15, 1989, pp249-260

45. Estimating the Usefulness of Search Engines W. Meng, K. Liu, C. Yu, W. Wu and N. Rishe. ICDE 1999. (more details)

46. The effectiveness of GlOSS for the Text Database Discovery Problem L. Gravano, H. Garcia-Molina, A. Tomasic. SIGMOD 1994. (GlOSS)

47. Adaptive methods for the computation of PageRank, Sepandar Kamvar1, Taher Haveliwala2, Gene Golub, Technical Report,, Standford University, 2003

48. Building a Distributed Full-Text Index for the Web, Melnik, Sergey and Raghavan, Sriram and Yang, Beverly and Garcia-Molina, Hector, ACM Transactions on Information Systems 2003

49. Parallel Crawlers, Junghoo Cho, Hector Garcia-Molina, 11^th WWW

50 An adaptive model for optimizing performance of an incremental web crawler , Edwards, J., McCurley, K. S., and Tomlin, J. A., In Proceedings of the Tenth Conference on World Wide Web (2001)

51. Focused crawling using context graphs, 26th International Conference on Very Large Databases

52. Effective page refresh policies for web crawlers. Junghoo Cho, Hector Garcia-Molina, ACM Transactions on Database Systems

53. Self-similarity in the web. Stephen Dill etc. ACM Transactions on Internet Technology (TOIT) archive (August 2002)

54. Finding replicated web collections. Junghoo Cho, N. Shivakumar, and Hector Garcia-Molina, ACM SIGMOD Record (June 2000)

55. Hilltop: A search engine based on expert documents. K. Bharat and G. A. Mihaila, 9^th WWW Conference (Poster), 2000.

56. TopicSensitive PageRank, In Proceedings of the Eleventh International World Wide Web Conference

57. Generalizing PageRank: Damping functions for link-based ranking algorithms, Ricardo BaezaYates etc, In Proceedings of SIGIR2002

58. Site Level Noise Removal for Search Engines, Carvalho, Paul - Alexandru Chirita, Edleno Silva de Moura, etc, In 15^th WWW

59. Efficient crawling through URL ordering, Junghoo Cho etc , Computer Networks and ISDN Systems archive 1998,

60. Stuff I've Seen: A System for Personal Information Retrieval and Re-Use, Susan Dumais etc, 26th ACM SIGIR conference on Research and development in informaion retrieval

61. When experts agree: Using non-affiliated experts to rank popular topics, Krishna Bharat, George A. Mihaila, 10^th WWW

62. The stochastic approach for link-structure analysis (salsa) and the tkc effect, R. Lempel, S. Moran, 9^th WWW

63. What is this Page Known for? Computing Web Page Reputations, Davood Rafiei, Alberto Mendelzon , 9^th WWW

64. PicASHOW: Pictorial Authority Search by Hyperlinks on the Web, R. Lempel, A. Soffer, 10^th WWW

65. Web Search via Hub Synthesis, Dimitris Achlioptas, Amos Fiat, Anna Karlin, Frank McSherry, 42nd IEEE Symposium on Foundations of Computer Science

66. Approximating Aggregate Queries about Web Pages via Random Walks, Ziv Bar-Yossef etc, VLDB 2000

Web Servers (top^)
1. Web Servers Issues (top^)
2. Web Proxies and Web Caching (top^)
3. Web Prefetching (top^)
4. WWW Workloads (top^)

Application Servers (top^)

Questions: How do we build a scalable Internet service located at a single site? Should we replicate to get end-to-end availability? What abstractions should we provide to support scalability to millions of users, and continuous operations 24 hours per day and 7 days per week?

1. Availability and Latency of World Wide Web Information Servers, Charles L. Viles and James C. French (University of Virginia), USENIX, Computing Systems; vol. 8, no. 1; Winter 1995.

2. A Quantitative Study of Differentiated Services S. Sahu, D. Towsley, J. Kurose. Proc. IEEE Global Internet'99 (Rio de Janeiro, Brazil, December 1999). A longer version is available as UMass CMPSCI Technical Report 99-09.

3. A Comparison of Server-Based and Receiver-Based Local Recovery Approaches for Scalable Reliable Multicast S. Kasera, J. Kurose, D. Towsley. Proc. IEEE Infocom'98 (San Francisco, CA, April 1998). A longer version is available as UMass CMPSCI Technical Report 97-69.

4. A Comparison of Sender-Initiated and Receiver-Initiated Reliable Multicast Protocols D. Towsley, J. Kurose, S. Pingali. IEEE Journal on Selected Areas in Communications (JSAC) (April 1997)

5. Exploiting Internetwork Multicast Services Nortel White Paper.

6. "Server-initiated Document Dissemination for the WWW" Azer Bestavros and Carlos Cunha,IEEE Data Engineering Bulletin, September 1996.

7, Middleware Support for Data Mining and Knowledge Discovery in Large-scale Distributed Information Systems., Azer Bestavros, In Proceedings of ACM SIGMOD'96 Data Mining Workshop, Montreal, Canada, June 1996.

8. "Speculative Data Dissemination and Service to Reduce Server Load, Network Traffic and Service Time for Distributed Information Systems" , Azer Bestavros, Proceedings of ICDE'96: The 1996 International Conference on Data Engineering, New Orleans, Louisiana. March 1996.

9. Using speculation to reduce server load and service time on the WWW, Azer Bestavros, in Proceedings of CIKM'95: The Fourth ACM International Conference on Information and Knowledge Management, Baltimore, Maryland. November 1995.

10. "Demand-based document dissemination to reduce traffic and balance load in distributed information systems" Azer Bestavros, in Proceedings of the 1995 Seventh IEEE Symposium on Parallel and Distributed Processing, San Antonio, Texas. October 1995.

11. Demand-based Data Dissemination for Distributed Multimedia Applications, Azer Bestavros, in Proceedings of the ACM/ISMM/IASTED International Conference on Distributed Multimedia Systems and Applications, Stanford, CA. August 1995.

12. "Information Dissemination and Speculative Service: Two candidate functionalities for the middleware infrastructure" Azer Bestavros, in Proceedings of SIGCOMM'95 Workshop on Middleware. Cambridge, MA, August 1995.

13. Personalized Information Environments: An Architecture for Customizable Access to Distributed Digital Libraries, James C. French and Charles L. Viles, D-Lib Magazine, June 1999, Volume 5 Number 6

14. Continuous Profiling: Where Have All the Cycles Gone? Jennifer M. Anderson, Lance M. Berc, Jeffrey Dean, Sanjay Ghemawat, Monika R. Henzinger, Shun-Tak A. Leung, Richard L. Sites, Mark T. Vandevoorde, Carl A. Waldspurger, and William E. Weihl. SOSP'97.

15. System Support for Automated Profiling and Optimization, Aolan Zhang, Zheng Wang, Nicholas Gloy, J. Bradley Chen, and Michael D. Smith, SOSP'97.

16. Cluster-Based Scalable Network Services. Fox, Gribble, Chawathe, and Brewer, Proceedings of SOSP, 1997.

17. A Case for Networks of Workstations: NOW, T. Anderson, D. Culler, D. Patterson, IEEE Micro Feb. 1995. (The Berkeley NOW Project

18. Free Transactions with Rio Vista, David E. Lowell and Peter M. Chen, Proceedings of the 1997 Symposium on Operating Systems Principles (SOSP), October 1997. (The Rio project).

19. The Case for Application-Specific Benchmarking, Margo Seltzer, David Krinsky, Keith Smith, Xiaolan Zhang.

20. Frangipani: A Scalable Distributed File System, C. Thekkath, T. Mann and E. Lee, Proceedings of the 1997 Symposium on Operating Systems Principles (SOSP), October 1997.

21. Serverless Network File Systems, Anderson, Dahlin, Neefe, Patterson, Roselli and Wang, SOSP, 1995.

22. A Note on Distributed Computing, Jim Waldo, Geoff Wyant, Ann Wollrath, and Sam Kendall, Sun Microsystems Laboratories Technical Report TR-94-29 (November 1994).

23. Application-Level Document Caching in the Internet Azer Bestavros et.al, Proceedings of SDNE'95: The second International Workshop on Services in Distributed and Network Environments. Whistler, Canada, June 1995.

24. WWW Media Distribution via Hopwise Reliable Multicast, James E. (Jed) Donnelley Lawrence Livermore National Laboratory, Livermore, California, USA. WWW'95.

25. Scalable Reliable Multicast Using Multiple Multicast Channels, S. Kasera, G. Hjalmtysson, D. Towsley, J. Kurose, To appear in IEEE/ACM Transactions on Networking, 2000.

26. Scalable fair reliable multicast using active services, S.K. Kasera, S. Bhattacharyya, M. Keaton, D. Kiwior, J. Kurose, D. Towsley, S. Zabel. IEEE Networks Magazine.

27. An Internet Multicast System for the Stock Market, N.F. Maxemchuk and D. H. Shur (AT&T Labs - Research)

28. Cooperative Reliable Multicast Protocol with Local Recovery, Young-mi Ohk, Steven H. Low

29. Real-time Applications of the Internet, N. F. Maxemchuk, Johns Hopkins University, April 8, 1999.

30. Operational Information Systems - An example from the Airline Industry, Van Oleson, Greg Eisenhaur, Calton Pu, Karsten Schwan, Beth Plale and Dick Amin. Sept., 2000.

31. Disconnected Operations in the Coda File System, J. Kistler and M. Satyanarayanan, ACM Transcations on Computing Systems, Vol 10, No 1, Pages 3-25, February 1992.

32. Information Monitoring on the Web: A Scalable Solution. Ling Liu, Wei Tang, David Buttler, and Calton Pu. World Wide Web Journal(by Kluwer Academic Publishers), Volume 5, No. 4.

33. InfoFilter: Supporting Quality of Service for Fresh Information Delivery, Ling Liu, Calton Pu, Karsten Schwan, Jon Walpole. New Generation Computing Journal (Vol.18, No.4),

34. Continual Queries for Internet Scale Event-Driven Information Delivery , Ling Liu, Calton Pu, Wei Tang. n: Special issue on Web Technologies, IEEE Transactions on Knowledge and Data Engineering, Vol.11, No.4, July/Aug. 1999. pp610-628.

35. Methodical Restructuring of Complex Workflow Activities, Ling Liu and Calton Pu. IEEE 14th International Conference on Data egineering, February 23-27, 1998, Orlando, Florida, USA. pp342-350.

36. Support for Data-intensive Applications in Large-scale Systems, Mike Dahlin, University of Texas at Austin E-Commerce White Paper Series

38. "Web content adaptation to improve server overload behavior", T.Abdelzaher and N.Bhatti, International World Wide Web conference,Toronto, Canada, May 1999.

39. "Web server QoS management by adaptive content delivery", T.Abdelzaher and N.Bhatti, International Workshop on Quality of Service, London, UK, June 1999.

40. "Digestor: Device-independent Access to the World Wide Web", T. Bickmore and B. Schilit, The Sixth International World Wide Web Conference, April 1997.

41. "Adapting the Web: An Adaptive Web Browser",K. Henricksen and J. Indulska, User Interface Conference, 2001

42. "HTTP Remote Variant Selection Algorithm --RVSA/1.0", K. Holtman and A. Mutz, RFC2296.

43. Reducing WWW Latency and Bandwidth Requirements by Real-Time Distillation. A. Fox and E. Brewer. Fifth International World Wide Web Conference (Paris, May 1996).

44. "Adaptive Delivery of HTML Contents", Y. Yang, J. Chen, and H. Zhang, 9th International World Wide Web Conference, Amsterdam, May 2000.

45. "Network-Adaptive Control With TCP-Friendly Protocol for Multiple Video Objects", Q.Zhang, Wenwu Zhu, and Y.Q.Zhang, ICME 2000.

46. Gardmon: A Java-based Monitoring Tool for Gardens Non-dedicated Cluster Computing. R. Buyya, B. Koshy, and R. Mudlapur. In Proceedings of Workshop on Cluster Computing Technologies, Environments, and Applications, PDPTA 99, Monte Carlo Resort, Las Vegas, Nevada, USA, 1999.

47. The Condor Distributed Processing System. M. Livny, Dr Dobbs Journal, Feb 1995 pp 40-48.

48. Project Ganglia: Distributed Monitoring and Execution System.

49. PARMON: A Comprehensive Cluster Monitoring System. Rajkumar et al., Proceedings of the Fifth International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'98), Las Vegas, Nevada, USA, CSREA Press, 1998.

50. Building a Resources Monitoring System for SMILE Beowulf Cluster. P. Uthayopas, S. Phaisithbenchapol, and K. Chongbarirux., Proceeding of the Third International Conference/Exhibition on High Performance Computing in Asia-Pacific Region (HPC ASIA'99), Singapore, 1998

Internet Computing System Basics (top^)
1. Performance Issues (top^)
2. Naming Issues (top^)

Advanced Internet Systems (top^)
1. Peer to Peer Computing (top^)
2. Mobile Computing (top^)
3. Sensor, Stream, and Continual Query (top^)
4. RFID (top^)
5. Geo-Location Based Services and Applications (top^)
6. Spatial Indexing and Spatial Mining (top^)

Social Network Analysis (top^)
1. Social Networks (top^)
2. Collaborative Filtering (top^)

Cloud Computing (top^)

Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung. The Google file system. In 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October, 2003.
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of OSDI 2004, San Francisco, CA, 2004.
Fay Chang et. al. Bigtable: A Distributed Storage System for Structured Data In Proceedings of OSDI 2006, Seattle, WA, 2006.
http://hadoop.apache.org/
Frank Schmuck, Roger Haskin. GPFS: A Shared File System For Large Computing Cluster. In Proceedings of the 2002 Conference on File and Storage Technologies (FAST)
Giuseppe DeCandia et. al. Dynamo: Amazon's Highly Available Key-value Store. In SOSP '07
Chandramohan A. Thekkath, Timothy Mann, Edward K. Lee Frangipani: A Scalable Distributed File SystemIn Proceedings of the 16th ACM Symposium on Operating Systems Principles, 1997
Mike Burrows. The Chubby lock service for loose-coupled distributed systems. In 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006.
Marcos K. Aguilera, Arif Merchant, Mehul Shah. Sinfonia: A New Paradigm for Building Scalable Distributed SystemsIn SOSP 2007
Philip H. Carns, Walter B. Ligon Iii, Robert B. Ross, Rajeev Thakur, PVFS: A Parallel File System for Linux Clusters In Proceedings of the 4th Annual Linux Showcase and Conference.
Red Hat Company. Red Hat Global File System
Andrew Pavlo et. al. A Comparison of Approaches to Large-Scale Data Analysis. SIGMOD'09.
Cheng-Tao Chu et. al. Map-Reduce for Machine Learning on Multicore. NIPS'06.
Azza Abouzeid et. al. HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads. In Proceedings of VLDB, 2009.
Kamil Bajda-Pawlikowski, Daniel J. Abadi, Avi Silberschatz, and Erik Paulson. Efficient Processing of Data Warehousing Queries in a Split Execution Environment. SIGMOD 2011.
lper Okcan and Mirek Riedewald. Processing Theta-Joins using MapReduce. SIGMOD 2011.
E. Friedman, P. Pawlowski, and J. Cieslewicz. SQL/MapReduce: a practical approach to self-describing, polymorphic, and parallelizable user-defined functions. PVLDB 2009.
C. Yang, C. Yen, C. Tan, and S. Madden. Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database. In ICDE '10, 2010.
H.-c. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker. Map-reduce-merge: simplified relational data processing on large clusters. In Proc. of SIGMOD, 2007
Hive :A Petabyte Scale Data Warehouse Using Hadoop. In ICDE, 2010.
S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A comparison of join algorithms for log processing in MapReduce. In Proc. of SIGMOD 2010.
R. Vernica, M. J. Carey, and C. Li. Efficient parallel set-similarity joins using mapreduce. In SIGMOD 2010.
F. N. Afrati and J. D. Ullman. Optimizing joins in a map-reduce environment. In EDBT, 2010.

Security and Privacy for Internet Applications (top^)
Questions: How can we build secure Internet Applications?
1. Security (top^)
2. Privacy (top^)
3. Location Privacy (top^)
4. Trust Management (top^)
5. Web Spam and Denial of Services Attacks (top^)

CS6675/4675 Advanced Internet Systems and Application Development

Instructor: Professor Ling Liu

Course Readings

|Requirement | Required Readings | General/Recommended Readings| Reading Summary Template|

Reading Summary Requirement

Homework/Assignment:

Reading Summary Guidelines:

Areas of Readings

General/Recommended Course Reading List

Search Engine Technology (top^)

Application Servers (top^)

28. Cooperative Reliable Multicast Protocol with Local Recovery, Young-mi Ohk, Steven H. Low

29. MINERVA∞ A scalable efficient peer-to-peer search engine, S Michel, P Triantafillou, G Weikum - LECTURE NOTES IN COMPUTER SCIENCE, 2005.

Related Courses at other Universities (top^)