CS4420/CS6422 Database System Implementation

Instructor: Professor Ling Liu

Course Readings

Attention: The information contained in this page is subject to changes.


| Requirement | Required Readings | General/Recommended Readings | Reading Summary Posting |


Reading Summary Requirement

There will be several background readings assigned each week. The readings will either be handed out a week before or listed on the Web page for required readings.

Homework/Assignment:

You are expected to read the material each week and write 2-3 paragraphs per reading giving your impressions and thoughts. The summaries should be informal and brief, and should consist of your own comments on the readings, NOT a rehash of the content.

You should submit your summaries on T-Square, preferably before each class but no later than 11:59 pm on Friday each week (unless there is no reading assignments for the week). Late assignment will NOT be accepted unless approved in advance by the instructor.

Reading Summary Guidelines:

The summary for each reading assignment is expected to consist of 1 paragraph on each of the following three aspects: (1) the positive aspect of the paper; (2) the negative aspect of the paper; and (3) a brief discussion on how the idea or method proposed or used in evaluation may be applied to your own project for the course.

You may want to keep these guidelines in mind when reading papers.

You may find the following short article helpful:

Efficient Reading of Papers in Science and Technology By Michael J. Hanson and updated by D. McNamee


Areas of Readings

1. Mobile Database Management

2. Spatial Indexing Techniques

3. Data Clustering Algorithms

4. Stream databases

5. RFID data management

6. Web Data Management, Web Search and Big Data

 

7. Data Mining

 

8. Privacy Preserving Data Mining

 

9. Workflow Management

 

10. Role based Access Control

 

11. Data Warehouse and OLAP

12. Social Networks

13. Data Storage and Indexing

14. Relational Query Optimization

15. Big Data Processing in the Cloud

16. RDF Data Management and Column Store

17. Peer to Peer Computing

Open Source SW

 


Required Readings and Dates

You are expected to read papers in the required reading list, but only write summary for one paper selected from the list of 2-3 required readings associated with each lecture. Please use the Summary Template to write the reading summaries.

NOTE: Most of the papers listed below are from ACM or IEEE conferences or Journals. Online proceedings can be accessible from the ACM /IEEE online library link provided by GT library. Your GT ID/Password are required to access the online library.

http://www.library.gatech.edu/research_help/subject/index.php?/computer_science/conferences


General/Recommended Course Reading List

1. Mobile Database Management

1.      MobiEyes: A Distributed Location Monitoring Service Using Moving Location Queries. Bugra Gedik and Ling Liu. IEEE Transactions on Mobile Computing. Vol. 5, No. 10, pp. 1384-1402, October 2006.

2.      Map-matching: Towards Improving Wireless Positioning, Kipp Jones and Ling Liu. to appear in Proceedings of the 4th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (Mobiquitous 2007). August 6-10, 2007, Philadelphia, PA.

3.      A SpatioTemporal Placement Model for Caching Location Dependent Queries, Anand Murugappan and Ling Liu.Proceedings of the 4th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (Mobiquitous 2007). August 6-10, 2007, Philadelphia, PA.

4.       Lira: Lightweight, Region-aware Load Shedding in Mobile CQ Systems. Bugra Gedik, Ling Liu, Kun-Lung Wu, Philip S. Yu. Proceedings of the IEEE 23rd International Conference on Data Engineering. Istanbul, Turkey; April 17-20, 2007.

5.      Effective Density Queries on Continuously Moving Objects. Christian S. Jensen, Dan Lin, Beng Chin Ooi, Rui Zhang. ICDE 2006

6.      Christian S. Jensen: Indexing the past, present, and anticipated future positions of moving objects. Mindaugas Pelanis, Simonas Saltenis, ACM Trans. Database Syst. 31(1): 255-298 (2006)

7.      Fast Nearest Neighbor Search on Road Networks. Hu, H., Lee, D.L., and Xu, J. Proceedings of the International Conference on Extending Database Technology (EDBT 2006), Munich, Germany, Mar 2006, 186-203.

8.      Distance Indexing on Road Networks. Hu, H., Lee, D.L., and Lee, V.C.S. Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006), Seoul, Korea, Sept 2006, 894-905.

9.      Change Tolerant Indexing for Constantly Evolving Data. Reynold Cheng, Yuni Xia, Sunil Prabhakar, Rahul Shah: ICDE 2005: 391-402 

10.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data. Reynold Cheng, Yuni Xia, Sunil Prabhakar, Rahul Shah, Jeffrey Scott Vitter: VLDB 2004 : 876-887

11.  Trajectory pattern mining, Fosca Giannotti, Mirco Nanni, Fabio Pinelli, Dino Pedreschi. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining KDD '07.

12.   Project lachesis: Parsing and modeling location histories, in: GIScience, 2004Hariharan, Toyama

13.  Extracting places from traces of locations. Kang, Welbourne, Stewart, Borriello. In Proc. WMASH, pages 110--118, New York, NY, USA, 2004.

14.  Monitoring Top-k Query in Wireless Sensor Networks, Xu, Wu, Tang, Lee. Proc. the 22nd IEEE Int. Conf. on Data Engineering (ICDE '06), Atlanta, GA, April 2006.

15.  Geographic Information Science: Defining the Field. David Mark.

16.  A bibliography of temporal, spatial and spatio-temporal data mining research, John F. Roddick , Myra Spiliopoulou , ACM SIGKDD Explorations Newsletter, v.1 n.1, p.34-38, June 1999

17.  Modeling Transportation Routines using Hybrid Dynamic Mixed Networks , Vibhav Gogate, Rina Dechter, Bozhena Bidyuk, James Marca and Craig Rindt, , In 21st Conference on Uncertainty in Artificial Intelligence (UAI), 2005.

18.  Tobler's First Law of Geography: A Big Idea for a Small World? Sui D.Z. Annals of the Association of American Geographers 94 (2), 269b277.

19.  Markovian Models for Sequential Data, Y. Bengio, NEURAL COMPUTING SURVEYS, vol 2 1999, 129~162.

20. Using Geospatial Information in Sensor Networks. John Heidemann. Nirupama Bulusu. USC/Information Sciences Institute. September 20, 2000.

21.  Building Personal Maps from GPS Data. Lin Liao and Donald J. Patterson and Dieter Fox and Henry Kautz.

22.  Processing Window Queries in Wireless Sensor Networks, Y. Xu, W.-C. Lee, J. Xu, and G. Mitchel Proc. the 22nd IEEE Int. Conf. on Data Engineering (ICDE '06), Atlanta, GA, April 2006.

23.  Location-Based Activity Recognition using Relational Markov Networks. L. Liao, D. Fox, and H. Kautz. Proc. of the International Joint Conference on Artificial Intelligence (IJCAI-05).

24.  Using GPS to Learn Significant Locations and Predict Movement Across Multiple Users, D. Ashbrook and T. Starner, Personal and Ubiquitous Computing, Vol. 7.5.

25.  Temporal Data Management. C. S. Jensen and R. T. Snodgrass. IEEE TKDE, 11(1): 36--45 (1999).

26.  Learning and Inferring Transporation Routines, Liao, Fox, Kautz, Artificial Intelligence 2007.

27.  Inferring High-Level Behavior from Low-Level Sensors, UBICOMP 2003. ICS 280.

28.  Fundamental Challenges in Mobile Computing, Satyanarayanan, M., Fifteenth ACM Symposium on Principles of Distributed Computing ,   May 1996, Philadelphia, PA, Revised version appeared as: "Mobile Computing: Where's the Tofu?",  Proceedings of the ACM Sigmobile, April 1997, Vol. 1, No. 1.

29.  Multi-Fidelity Algorithms for Interactive Mobile Applications,  Satyanarayanan, M., Narayanan, D. Proceedings of the 3rd International Workshop on Discrete Algorithms and Methods for Mobile Computing and Communications, August 1999, Seattle, WA

30.  Mobile Data Access, Noble, B.School of Computer Science, Carnegie Mellon University, May 1998, CMU-CS-98-118

31.  Energy-aware adaptation for mobile applications, Flinn J., Satyanarayanan, M., Proceedings of the 17th ACM Symposium on Operating Systems Principles, December, 1999, Kiawah Island Resort, SC.

32.  PowerScope: A Tool for Profiling the Energy Usage of Mobile Applications, Flinn J., Satyanarayanan, M., Proceedings of the Second IEEE Workshop on Mobile Computing Systems and Applications, February, 1999, New Orleans, LA

33.  System Support for Mobile, Adaptive Applications, Noble, Brian, IEEE Personal Communications, Vol. 7, No. 1, February, 2000

34.  Experience with adaptive mobile applications in Odyssey , Noble, B.D. and Satyanarayanan, M., Mobile Networks and Applications, Vol. 4, 1999

35.  Agile Application-Aware Adaptation for Mobility, Noble, B., Satyanarayanan, M., Narayanan, D., Tilton, J.E., Flinn, J., Walker, K. Proceedings of the 16th ACM Symposium on Operating System Principles, October 1997, St. Malo, France

36.  A Research Status Report on Adaptation for Mobile Data Access , Noble, B., Satyanarayanan, M. SIGMOD Record, Vol. 24, No. 4, December 1995

37.  A Programming Interface for Application-Aware Adaptation in Mobile Computing , Noble, B., Price, M., Satyanarayanan, M., Proceedings of the Second USENIX Symposium on Mobile & Location-Independent Computing, Apr. 1995, Ann Arbor, MI

38.  Application-Aware Adaptation for Mobile Computing , Satyanarayanan, M., Noble, B., Kumar, P., Price, M.     Proceedings of the 6th ACM SIGOPS European Workshop,  Sep. 1994, Dagstuhl, Germany.

39.  Mobile Information Access, Satyanarayanan, M. , IEEE Personal Communications, Vol. 3, No. 1, February 1996

40.  Indexing Techniques for Power Management in Multi-Attribute Data Broadcast Qinglong Hu, Wang-Chien Lee, and Dik Lun Lee.

41.  Power conserving And access Efficient Indexes For Wireless Computing Dik Lun Lee, and Qinglong Hu,

42.  Power Conservative Multi-Attribute Queries on Data Broadcast, Qinglong Hu, Wang-Chien Lee, and Dik Lun Lee, ICDE 2000.

43.  Effects of power conservation, wireless coverage and cooperation on data dissemination among mobile devices", Maria Papadopouli and Henning Schulzrinne, ACM  SIGMOBILE Symposium on Mobile Ad Hoc Networking & Computing (MobiHoc) 2001, October 4-5, 2001, Long Beach, California. (Extension of the Sarnoff paper.)

44.  Energy-aware Web Caching for Mobile Terminals. Francoise Sailhan, Valrie Issarny. In Proceedings of the ICDCS Workshop on Web Caching Systems. July 2002, Vienna, Austria.

45.  Power-Controlled Data Prefetching/Caching in Wireless Packet Networks, Savvas Gitzenis and Nicholas Bambos, IEEE Infocom 2002, New York.

46.  Sleepers and Workaholics: Caching Strategies in Mobile Environments. Daniel Barbara, Tomasz Imielinski,VLDB Journal 4(4): 567-602(1995).

47.  Indexing techniques for data broadcast on wireless channels. D.L. Lee, Q. Hu, and W. C. Lee,Proceedings of the Fifth International Conference on Foundations of Data Organization (FODO '98), Kobe, Japan, Nov 11-12, 1998, 175-182.

48.  Indexing Techniques for Wireless Data Broadcast Under Data Clustering and Scheduling,Qinglong Hu, Wang-Chien Lee, and Dik Lun Lee, in Proceedings of ACM International Conference on Information and Knowledge Management (CIKM99), Kansas City, Missouri, Nov. 1999, pp. 351-358.

49.  Location Privacy in Pervasive Computing, A. R. Beresford, F. Stajano. In Proc of IEEE Pervasive Computing 46-55, March 2003

50.  Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms . B. Gedik, L. Liu, IEEE Transactions on Mobile Computing, 2008 Jan, 2008 (an extended abs appeared in ICDCS 2005.

51.  Framework for Security and Privacy in Automotive Telematics. S. Duri, M. gruteser, X. Liu, P. Moskowitz, R. Perez, M. Sing, J. M. TangProc of Intl Workshop on Mobile Commerce WMC, 2002.

52.  Anonymous Usage of Location-Based Services Through Spatial and Temporal Cloaking. M. gruteser, D. GrunwaldProc of ACM/USENIX MobiSys, 2003.

53.  Privacy-Aware Mobile Services over Road Networks . T. Wang and L. Liu, Proceedings of the 35th International Conference on Very Large Data Bases (VLDB'09). , 2009.

54.  Framework for Security and Privacy in Automotive Telematics. Bhuvan Bamba, Ling Liu, Peter Pesti and Ting Wang Proceedings of 17th International World Wide Web Conference (WWW'08), April 2008.

55.  A Energy Efficient Approach to Processing Spatial Alarms on Mobile Clients Anand Murugappan and Ling Liu. Proceedings of the ISCA 17th International Conference on Software Engineering and Data Engineering (SEDE-2008), June 30 - July 2, 2008, Los Angles.

57.  Scalable Processing of Spatial Alarms. Bhuvan Bamba, Ling Liu, Philip S. Yu Proceedings of the 15th Annual IEEE International Conference on High Performance Computing (HiPC 2008), December 17-20, 2008

58.  Distributed Processing of Spatial Alarms: A Safe Region-based Approach. Bhuvan Bamba, Ling Liu, Philip Yu, Arun Iyengar. Proceedings of IEEE Int. Conf. on Distributed Computing (ICDCS 2009), June 22-26, in Montreal, Quebec, Canada.

59.  Map-matching: Towards Improving Wireless Positioning. Kipp Jones, Ling Liu, and Farshid Alizadeh-Shabdiz Proceedings of the 4th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (Mobiquitous 2007). August 6-10, 2007, Philadelphia, PA.

60.  What Where Wi: An Analysus of Millions of WiFi Access Points. Kipp Jones and Ling Liu Proceedings of 2007 IEEE Portable: International Conference on Portable Information Devices. Orlando, FL, March 25-29


2. Spatial Indexing and Spatial Mining Techniques


1.      R-trees: a dynamic index structure for spatial searching. Antonin Guttman , Proceedings of the 1984 ACM SIGMOD international conference on Management of data, June 18-21, 1984, Boston, Massachusetts

2.      Indexing the positions of continuously moving objects. S. Saltenis, C. S. Jensen, S. T. Leutenegger, and M. A.Lopez. In SIGMOD b00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 331b342, New York, NY, USA, 2000. ACM Press.

3.      Voronoi Diagram, Franz Aurenhammer, Rolf Klein1

4.      Spatial Databases: Accomplishments and Research Needs, S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu and C.T. Liu, IEEE Transactions on Knowledge and Data Engineering, Jan.-Feb. 1999.

5.     Discovering Spatial Co-location Patterns: a Summary of Results, S. Shekhar and Y. Huang, In Proc. of 7th International
Symposium on Spatial and Temporal Databases (SSTD01), July 2001.

6.      Detecting Graph-based Spatial Outliers: Algorithms and Applications, S. Shekhar, C.T. Lu, P. Zhang, the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001.

7.      Extending Data Mining for Spatial Applications: A Case Study in Predicting Nest Locations, S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, Proc. Int. Confi. on 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX, May 14, 2000.

8.      Modeling Spatial Dependencies for Mining Geospatial Data, S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, First SIAM International Conference on Data Mining, 2001.

9.      Spatial Contextual Classification and Prediction Models for Mining Geospatial Data, S. Shekhar, P.R. Schrater, R. R. Vatsavai, W. Wu, and S. Chawla, IEEE Transactions on Multimedia, 2001.

10.   The Quadtree and Related Hierarchical Data Structures. Finkel and Bentley, ACM Comput. Surv.1974

11.  An introductory tutorial on kd-trees, A. Moore

12.  Building of Trapezoidal Map from a set of non-intersecting lines, Jukka Kaartinen

13.  Spatial data structures for version management of engineering drawings in cad database. Y. Nakamura and H. Dekihara. In ICIAP b03: Proceedings of the 12th International Conference on Image Analysis and Processing, page 219, Washington, DC, USA, 2003. IEEE Computer Society.


3. Data Clustering Algorithms

1.             Data Clustering: A Review, A. K. Jain, M.N. Murthy and P.J. Flynn, ACM Computing Reviews, Nov 1999.

2.             On Line Clustering, Athman Bouguettaya, IEEE Transaction on Knowledge and Data Engineering Volume 8, No. 2, April 1996.

3.             Similarity Searching in Medical Image Databases, Euripides G.M. Petrakis and Christos Faloutsos, IEEE Transaction on Knowledge and Data Engineering Volume 9, No. 3, MAY/JUNE 1997.

4.             Windows NT Clusters for Availability and Scalability, Rob Short, Rod Gamache, John Vert and Mike Massa ,Microsoft Online Research Papers, Microsoft Corporation.

5.             Defining Data Mining, The Hows and Whys of Data Mining, and How It Differs From Other Analytical Techniques, Bruce Moxon, Online Addition of DBMS Data Warehouse Supplement, August 1996.

6.             An Efficient Approach to Clustering in Large Multimedia Databases with Noise. Hinneburg A., Keim D.A. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, AAAI Press, 1998. http://citeseer.ist.psu.edu/hinneburg98efficient.html

7.             Data Clustering: Theory, Algorithms, and Applications, Guojun Gan , Chaoqun Ma , Jianhong Wu

8.             Chameleon: A hierarchical Clustering Algorithms Using Dynamic Modeling IEEE Computer, George Karypis, Eui-Hong Han, and Vipin Kumar, Special Issue on Data Analysis and Mining. Vol. 32, No. 8, August 1999.

9.           iVIRBRATE: Interactive Visualization Based Framework for Clustering Large Datasets, Keke Chen and Ling Liu. ACM Transactions on Information Systems.

10.         CURE: An efficient clustering algorithm for large databases, S. Guha, R. Rastogi, and K. Shim, In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 73--84, New York, 1998.

11.         BIRCH: An Efficient Data Clustering Method for Very Large Databases, Tian Zhang, Raghu Ramakrishnan, and Miron Livny, In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 103--114, Montreal, Canada, 1996.

12.         Bipartite Graph Partitioning and Data Clustering. H. Zha and X. He and C. Ding and M. Gu and H. Simon. Proc. of {ACM} 10th Int'l Conf. Information and Knowledge Management, pp. 25--31, 2001. 

13.         Spectral biclustering of microarray data: coclustering genes and conditions. Y. Kluger and R. Basri and J.T. Chang and M. Gerstein. Genome Research. 13:703-716, 2003. 

14.         Automatic subspace clustering of high dimensional data for data mining applications. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data, Seattle, Washington, June 1998

15.         A divisive information-theoretic feature clustering algorithm for text classification. I.S. Dhillon and S. Mallela and R. Kumar. JMLR, 3:1265-1287, 2003.

16.         Subspace clustering of high-dimensional binary data -- A probabilistic approach. A. Patrikainen and H. Mannila. Proc. Workshop on Clustering High Dimensional Data in {SIAM} International Conference on Data Mining, 2004.

17.         Segmentation using eigenvectors: a unifying view. Weiss Y. Proceedings IEEE International Conference on Computer Vision p. 975-982 (1999).

18.         Coupled two-way clustering analysis of gene microarray data. G. Getz and E. Levine and E. Domany. Proceedings of the National Academy of Sciences of the United States of America, 94:12079-12084, 2000.

19.         On clusterings - good, bad and spectral, S. Vempala R. Kannan and A. Vetta, in Proc. 41st Symposium on the Foundation of Computer Science, FOCS, 2000.

20.         Co-clustering documents and words using bipartite spectral graph partitioning. I.S. Dhillon. Knowledge Discovery and Data Mining, pp. 269--274, 2001.

21.         Iterative Double Clustering for Unsupervised and Semi-Supervised Learning, R. El-Yaniv and O. Souroujon.NIPS 14, pp. 1025-1032, 2002.

22.    Clustering by Passing Messages Between Data Points. Frey, B. J. & Dueck, D. Science, 2007, 315, 972-976
      


4. Stream databases


1.       Continuous Queries over Data Streams   John S. Breese, David Heckerman, and Carl Kadie, S. Babu and J. Widom.In SIGMOD Record, September 2001.

2.       Towards Sensor Database Systems. Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri. In Proceedings of the Second International Conference on Mobile Data Management. Hong Kong, January 2001. 

3.      Querying the Physical World. Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri. IEEE Personal Communications, Vol. 7, No. 5, October 2000, pages 10-15. Special Issue on Smart Spaces and Environments.

4.       Fjording the Stream: An Architecture for Queries over Streaming Sensor Data, Sam Madden and Michael J. Franklin,ICDE Conference, February, 2002, San Jose.

5.     Streaming Queries over Streaming Data Sirish Chandrasekaran, Michael J. Franklin, VLDB Conference, August 2002, Hong Kong.

6.     Monitoring Streams: A New Class of Data Management Applications.D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik. In proceedings of the 28th International Conference on Very Large Data Bases (VLDB'02), August 20-23, Hong Kong, China.

7.      Gigascope: a stream database for network applications , Chuck Cranor, Theodore Johnson, and Oliver Spatscheck,in Proceedings of SIGMOD 2003.

8.     Query Processing, Approximation, and Resource Management in a Data Stream Management System. R. Motwani et al. CIDR, 2003.

9.     Aurora: A New Model and Architecture for Data Stream Management. D. Abadi, D. Camey, U. Cetintemel, M. Chemiack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. in VLDB Journal, 2003.

10.   Issues in Data Stream Management. Golab, L. und Ozsu, M. T. ACM SIGMOD Record. 32(2). 2003.

11. Estimating Clustering Indexes in Data Streams, Luciana Buriol, Gereon Frahling, Stefano Leonardi, Christian Sohler, Proc. 15th European Symposium on Algorithms (ESA), 2007


5. RFID data management

1.             Security and Privacy Issues in ePassport, Ari Juels, David Molnar and David Wagner, In Proceedings of Advances in Cryptology, 2005.

2.             Privacy and Security Issues in Library RFID Issues, Practices, and Architectures, David Molnar and David Wagner, In Proceedings of ACM CCS, 2004.

3.             High Power Proxies for Enhancing RFID Privacy and Utility, In Proceedings of PET, 2005.

4.             RFID Security and Privacy: A Research Survey, Ari Juels, In Proceedings of IEEE Journal on Selected Areas in Communication, 2006.

5.             A Platform for RFID Security and Privacy Administration. Melanie R. Rieback, Vrije Universiteit Amsterdam; Georgi N. Gaydadjiev, USENIX/SAGE Large Installation System Administration conference - LISA'06, December 2006

6.             RFID Privacy: An Overview of Problems and Proposed Solutions, IEEE Security and Privacy. v3 i3. 34-43, Pages: 897-914, 2007

7.             Protocols for RFID tag/reader authentication, Selwyn Piramuthu, Decision Support Systems, Volume 43, Issue 3, April 2007, Pages 897-914

8.      RFID privacy issues and technical challenges. Miyako Ohkubo, Koutarou Suzuki, Shingo Kinoshita. September 2005 Communications of the ACM, Volume 48 Issue 9

9.      Privacy for RFID through trusted computing. David Molnar, Andrea Soppera, David Wagner. November 2005 WPES '05: Proceedings of the 2005 ACM workshop on Privacy in the electronic society.

10.    RFID security and privacy: long-term research or short-term tinkering? Gene Tsudik, Mike Burmester, Ari Juels, Alfred Kobsa, David Molnar, Roberto Di Pietro, Melanie Rieback March 2008 WiSec '08: Proceedings of the first ACM conference on Wireless network security

11.   Mutual authentication in RFID: security and privacy. Radu-Ioan Paise, Serge Vaudenay. March 2008 ASIACCS '08: Proceedings of the 2008 ACM symposium on Information, computer and communications security

12.   Robust, anonymous RFID authentication with constant key-lookup. Mike Burmester, Breno de Medeiros, Rossana Motta. March 2008 ASIACCS '08: Proceedings of the 2008 ACM symposium on Information, computer and communications security


6. Web Data Management, Web Search and Big Data

1. Bigtable: A Distributed Storage System for Structured Data, Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,Tushar Chandra, Andrew Fikes, Robert E. Gruber, 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006

2. MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean, Sanjay Ghemawat, OSDI'04: Sixth Symposium on Operating System Design and Implementation, 2004

3. Clustering Billions of Images with Large Scale Nearest Neighbor Search, Ting Liu, Charles Rosenberg, Henry A. Rowley, IEEE Workshop on Applications of Computer Vision, 2007

4. Scaling Up All Pairs Similarity Search, Roberto Bayardo, Yiming Ma, Ramakrishnan Srikant, Proc. of the 16th Int'l Conf. on the World Wide Web, 2007

5. Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping, Mikhail Bilenko, Sugato Basu, Mehran Sahami, Proceedings of the 5th IEEE International Conference on Data Mining, 2005

6. Evaluating similarity measures: a large-scale study in the orkut social network, Ellen Spertus, Mehran Sahami, Orkut Buyukkokten, Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2005), 2005

7. Unweaving a web of documents, R. Guha, Ravi Kumar, D. Sivakumar, Ravi Sundaram, KDD, 2005

8. Mining Optimized Gain Rules for Numeric Attributes, Sergey Brin, Rajeev Rastogi, Kyuseok Shim, IEEE Trans. Knowl. Data Eng., 2003

9. Scalable Techniques for Mining Causal Structures, Craig Silverstein, Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, VLDB, 1998

10.  Query by Semantic Example, Nikhil Rasiwasia, Nuno Vasconcelos, Pedro J. Moreno, CIVR, 2006

11.  Indexing Dataspaces, Xin Dong, Alon Halevy, Proc. ACM SIGMOD, 2007

12.  Query Suspend and Resume, Badrish Chandramouli, Chris Bond, Shivnath Babu, Jun Yang, Proc. ACM SIGMOD, 2007

13.  Web-scale Data Integration: You can only afford to Pay As You Go, Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (Luna) Dong, David Ko, Cong Yu, Alon Halevy, Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2007

14.  Data integration: the teenage years, Alon Halevy, Anand Rajaraman, Joann Ordille, Proc. 32nd International Conference on Very Large Databases, 2006

15.  Data management projects at Google, Wilson Hsieh, Jayant Madhavan, Rob Pike, SIGMOD Conference, 2006

16.  On-the-fly Sharing for Streamed Aggregation, Sailesh Krishnamurthy, Chung Wu, Michael J. Franklin, SIGMOD Conference, 2006

17.  Principles of dataspace systems, Alon Y. Halevy, Michael J. Franklin, David Maier, PODS, 2006

18.  Structured Data Meets the Web: A Few Observations, Jayant Madhavan, Alon Halevy, Shirley Cohen, Xin (Luna) Dong, Shawn R. Jeffery, David Ko, Cong Yu, Data Engineering Bulletin, 2006

19.  ULDBs: databases with uncertainty and lineage, Omar Benjelloun, Anish Das Sarma, Alon Halevy, Jennifer Widom, Proc. 32nd International Conference on Very Large Databases, 2006

20.  Web Search for a Planet: The Google Cluster Architecture, Luiz Andre Barroso, Jeffrey Dean, Urs Hlzle, IEEE Micro, 2003

21.  Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms, Monika Henzinger, Proc. SIGIR, 2006

22.  Indexing Shared Content in Information Retrieval Systems, Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Michael Herscovici, Ronny Lempel, John McPherson, Runping Qi, Eugene J. Shekita, EDBT, 2006

23.  Introduction to the special issue on XML retrieval, Ricardo Baeza-Yates, Norbert Fuhr, Yoelle Maarek, ACM Transactions on Information Systems, 2006

24.  Retroactive Answering of Search Queries, Beverly Yang, Glen Jeh, Proc. International World Wide Web Conference, 2006

25.  Semantic Search via XML Fragments: A High Precision Approach to IR, Jennifer Chu-Carroll, John Prager, Krzysztof Czuba, David Ferrucci, Pablo Duboue, Proc. 29th ACM SIGIR Conference on Research and Development in Information Retrieval, 2006

26.  Using annotations in enterprise search, Pavel A. Dmitriev, Nadav Eiron, Marcus Fontoura, Eugene Shekita, WWW, 2006

27.  Web mining with search engines: A web-based kernel function for measuring the similarity of short text snippets, Mehran Sahami, Timothy D. Heilman, Proc. 15th International World Wide Web Conference, 2006

28.  Concept-based interactive query expansion, Bruno M. Fonseca, Paulo Braz Golgher, Bruno Possas, Berthier A. Ribeiro-Neto, Nivio Ziviani, CIKM, 2005

29.  Information Discovery--Needles and Haystacks, Carl Lagoze, Amit Singhal, IEEE Internet Computing, 2005

30.  Algorithmic Aspects of Web Search Engines, Monika Rauch Henzinger, ESA, 2004

31.  eBizSearch: a niche search engine for e-business, C. Lee Giles, Yves Petinot, Pradeep B. Teregowda, Hui Han, Steve Lawrence, Arvind Rangaswamy, Nirmal Pal, SIGIR, 2003

32.  Semantic Associations for Contextual Advertising. Massimiliano Ciaramita and Vanessa Murdock and Vassilis Plachouras. Journal of Electronic Commerce Research Special Issue on Online Advertising and Sponsored Search.

33.  The Impact of Caching on Search Engines. Ricardo Baeza-Yates, Aristides Gionis, Flavio Junqueira, Vanessa Murdock, Vassilis Plachouras, Fabrizio Silvestri. 2007. 30th Annual International ACM SIGIR Conference.

34.  Tree revision learning for dependency parsing. G. Attardi and M. Ciaramita. 2007. In Proceedings of HLT-NAACL 2007.

35.  Know your Neighbors: Web Spam Detection using the Web Topology. Carlos Castillo and Debora Donato and Aristides Gionis and Vanessa Murdock and Fabrizio Silvestri. 2007. In Proceedings of SIGIR. ACM Press. (July 2007), Amsterdam, Netherlands, 423--430.

36.  James Caverlee, Steve Webb, and Ling Liu. "Spam-Resilient Web Rankings via Influence Throttling", Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS), Long Beach, 2007.

37.  The Self-Organized Web: The Yin to the Semantic Webbs Yang. Gary William Flake, David M. Pennock, and Daniel C. Fain. 2003. IEEE Intelligent Systems. 18, 4 75-77

38.  A content and structure website mining model. Barbara Poblete and Ricardo Baeza-Yates. 2006. In WWW '06: Proceedings of the 15th international conference on World Wide Web (Edinburgh, Scotland). ACM Press. New York, NY, USA, 957--958

39.  Relationship Between Web Links and Trade. Ricardo Baeza-Yates and Carlos Castillo. 2006. In WWW '06: Proceedings of the 15th international conference on World Wide Web (Edinburgh, Scotland). ACM Press. New York, NY, USA, 927--928.

40.  Communities from Seed Sets. Reid Andersen and Kevin J. Lang. 2006. In WWW '06: Proceedings of the 15th international conference on World Wide Web (Edinburgh, Scotland). ACM Press. New York, NY, USA, 223--232

41.  Generating Query Substitutions. Rosie Jones and Benjamin Rey and Omid Madani and Wiley Greiner. 2006. In WWW '06: Proceedings of the 15th international conference on World Wide Web. ACM Press. New York, NY, USA, 387--396.

42.  Multi-structural databases. Ronald Fagin, R. Guha, Ravi Kumar, Jasmine Novak, D. Sivakumar, Andrew Tomkins. 2005. In PODS '05: Proceedings of the twenty-forth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM Press. New York, NY, USA, 184--195.

43.  Unweaving a web of documents. R. Guha, Ravi Kumar, D. Sivakumar and Ravi Sundaram. 2005. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM Press. New York, NY, USA, 574--579.

44.  Variable latent semantic indexing. Anirban Dasgupta, Ravi Kumar, Prabhakar Raghavan and Andrew Tomkins. 2005. In KDD '05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM Press. New York, NY, USA, 13--21.

45.  Efficient implementation of large-scale multi-structural databases. Ronald Fagin, Phokion Kolaitis, Ravi Kumar, Jasmine Novak and Andrew Tomkins. 2005. o?=In VLDB '05: Proceedings of the 31st international conference on Very Large Data Bases. VLDB Endowment. 958--969.

46.  Discovering large dense subgraphs in massive graphs. David Gibson, Ravi Kumar and Andrew Tomkins. 2005. In VLDB '05: Proceedings of the 31st international conference on Very Large Data Bases. VLDB Endowment. 721--732.

47.  Query Incentive Networks. Jon M. Kleinberg and Prabhakar Raghavan. 2005 In FOCS '05: 46th Annual IEEE Symposium on Foundations of Computer Science. Pittsburgh, PA, 132--141 2.

48.  Indie: Distributed Indexing of autonomous Internet Services Peter Danzig, Shih-Hao Li, Katia Obraczka. Journam of Computer Systems, 5(4), 1992. Original description of Indie in 1991 ACM SIGIR

49.  Measuring Index Quality using Random Walks on the Web Monika Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc A. Najork, Proceedings of the 8th International World Wide Web Conference, pages 213-225, May 1999

50.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery Soumen Chakrabarti, Martin van den Berg, Byron Dom, Proceedings of the 8thInternational World Wide Web Conference, May 1999

51.  Enhanced hypertext categorization using hyperlinks.S. Chakrabarti, B. Dom and P. Indyk. Proceedings of ACM SIGMOD 1998.

52.  Mining the link structure of the World Wide Web. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. IEEE Computer.

53.  Trawling the Web for emerging cyber-communities. S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Eighth World Wide Web conference, Toronto, Canada, May 1999.

54.  Extracting large scale knowledge bases from the web. S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. IEEE International conference on Very Large Databases (VLDB), Edinburgh, Scotland.

55.  Clustering categorical data: an approach based on dynamical systems. D. Gibson, J. Kleinberg and P. Raghavan. Proceedings of the VLDB conference, 1998.

56.  The effectiveness of GlOSS for the Text Database Discovery Problem L. Gravano, H. Garcia-Molina, A. Tomasic. SIGMOD 1994. (GlOSS)

57.  A Comparison of Approaches to Large-Scale Data Analysis Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, Michael Stonebraker. SIGMOD 2009.

58. Brin, Sergey, and Lawrence Page. "The anatomy of a large-scale hypertextual Web search engine." Computer networks and ISDN systems 30, no. 1 (1998): 107-117.

59. Kleinberg, Jon M. "Authoritative sources in a hyperlinked environment." Journal of the ACM (JACM) 46, no. 5 (1999): 604-632.


7. Data Mining

1.             Uncertain Data Mining: An Example in Clustering Location Data. Michael Chau, Reynold Cheng, Ben Kao, Jackey Ng. PAKDD 2006: 199-204.

2.             Trajectory pattern mining, Fosca Giannotti, Mirco Nanni, Fabio Pinelli, Dino Pedreschi, Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining KDD '07

3.             Geo-word centric association rule mining, Katsumi Takahashi, Iko Pramudiono, Masaru Kitsuregawa, Proceedings of the 6th international conference on Mobile data management MDM '05

4.             Similarity and matching: Distributed spatio-temporal similarity search, Demetrios Zeinalipour-Yazti, Song Lin, Dimitrios Gunopulos, Proceedings of the 15th ACM international conference on Information and knowledge management CIKM '06

5.             Temporal moving pattern mining for location-based service, Journal of Systems and Software, Jun Wook Lee, Ok Hyun Paek, Keun Ho Ryu, Volume 73 ,  Issue 3  (November-December 2004)

6.             Incorporating Prior Knowledge with Weighted Margin Support Vector Machines, Xiaoyun Wu, and Rohini Srihari. Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004

7.             Query Chains: Learning to Rank from Implicit Feedback, Filip Radlinski and Thorsten Joachims. Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining 2005

8.             Very Sparse Random Projections, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006

9.             Generating Semantic Annotations for Frequent Patterns with Context Analysis, Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, and ChengXiang Zhai, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 2006

10.         Ongoing Management and Application of Discovered Knowledge in a Large Regulatory Organization: A Case Study of the Use and Impact of NASD Regulation's Advanced Detection System, Ted Senator. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining 2000|

11.         Empirical Bayes Screening for Multi-Item Associations in Massive Datasets, William DuMouchel and Daryl Pregibon. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining 2003

12.         Capturing Best Practice for Microarray Gene Expression, Gregory Piatetsky-Shapiro, Tom Khabaza, and Sridhar Ramaswamy. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining 2003

13.    Influence and correlation in social networks. Aris Anagnostopoulos, Ravi Kumar, Mohammad Mahdian. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining 2008, Las Vegas, Nevada, USA August 24 - 27, 2008

14.    Mining adaptively frequent closed unlabeled rooted trees in data streams. Albert Bifet, Ricard Gavald. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining 2008, Las Vegas, Nevada, USA August 24 - 27, 2008

15.    De-duping URLs via rewrite rules. Anirban Dasgupta, Ravi Kumar, Amit Sasturkar. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining 2008, Las Vegas, Nevada, USA August 24 - 27, 2008

16.    Learning classifiers from only positive and unlabeled data. Charles Elkan, Keith Noto. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining 2008, Las Vegas, Nevada, USA August 24 - 27, 2008

17.    Entity categorization over large document collections. Venkatesh Ganti, Arnd C. König, Rares Vernica .Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining 2008, Las Vegas, Nevada, USA August 24 - 27, 2008

18.    Discrimination-aware data mining. Dino Pedreshi, Salvatore Ruggieri, Franco Turini. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining 2008, Las Vegas, Nevada, USA August 24 - 27, 2008

19.   Show me the money!: deriving the pricing power of product features by mining consumer reviews. Nikolay Archak, Anindya Ghose, Panagiotis G. Ipeirotis.  Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 2007, San Jose, California, USA August 12 - 15, 2007

20.   Feature selection methods for text classification. Anirban Dasgupta, Petros Drineas, Boulos Harb, Vanja Josifovski, Michael W. Mahoney.  Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 2007, San Jose, California, USA August 12 - 15, 2007

21.   Tracking multiple topics for finding interesting articles. Raymond K. Pon, Alfonso F. Cardenas, David Buttler, Terence Critchlow.   Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 2007, San Jose, California, USA August 12 - 15, 2007

22.   Distributed classification in peer-to-peer networks. Ping Luo, Hui Xiong, Kevin Lü, Zhongzhi Shi.  Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 2007, San Jose, California, USA August 12 - 15, 2007

23. Fast Algorithms for Mining Association Rules, by R. Agrawal and R. Srikant. In Proc. 20th Int. Conf. on Very Large Databases (VLDB 1994, Santiago de Chile), pp487-499 Morgan Kaufmann, San Mateo, CA, USA 1994

24. J. Han, Y. Fu, iDiscovery of multiple-level association rules from large databases, in: Proc. 21st Int. Conf. on Very Large Data Bases, Zurich, Switzerland, pp. 420–431, 1995.

25. Fast Discovery of Association Rules, R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo In: U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds. Advances in Knowledge Discovery and Data Mining, 307-328 AAAI Press / MIT Press, Cambridge, CA, USA 1996

26. B. Liu, W. Hsu, Y. Ma, Mining association rules with multiple minimum supports, in: Proc. 1999 Int. Conf. on Knowledge Discovery and Data Mining, San Deige, CA, 1999, pp. 337-341.

27. Efficient Implementations of Apriori and Eclat, by Christian Borgelt. Workshop of Frequent Item Set Mining Implementations (FIMI 2003, Melbourne, FL, USA).

28. M. J. Zaki, C.J. Hsiao, Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No 4, April 2005, pp. 462-478, 2005.

29. M.C. Tseng , W.Y. Lin, Efficient mining of generalized association rules with non-uniform minimum support, Data & Knowledge Engineering 62, ScienceDirect, pp. 41–64, 2007.

30. Aggarwal, C.C. and Yu, P.S. 2001, A New Approach to Online Generation of Association Rules, IEEE Transactions on Knowledge and Data Engineering. Volume 13, No 4,pp. 527-540.

31. An Improved Algorithm for Mining Association Rules Using Multiple Support Values, by Ioannis N. Kouris, Christos H. Makris, Athanasios K. Tsakalidis

32. Jain, Anil K., M. Narasimha Murty, and Patrick J. Flynn. "Data clustering: a review." ACM computing surveys (CSUR) 31, no. 3 (1999): 264-323.

33. Xu, Rui, and Donald Wunsch. "Survey of clustering algorithms." Neural Networks, IEEE Transactions on 16, no. 3 (2005): 645-678.

34. Safavian, S. Rasoul, and David Landgrebe. "A survey of decision tree classifier methodology." Systems, Man and Cybernetics, IEEE Transactions on 21, no. 3 (1991): 660-674.


8. Privacy Preserving Data Mining

1.   A Privacy-Preserving Index for Range Queries, Bijit Hore, Sharad Mehrotra, Gene Tsudik, VLDB 2004

2.   Auditing Compliance with a Hippocratic Database, Rakesh Agrawal Roberto Bayardo Christos Faloutsos Jerry Kiernan Ralf Rantzau Ramakrishnan Srikant, VLDB 2004

3.   Privacy-preserving data mining. R. Agrawal and S. Ramakrishnan. In Proceedings of of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 439--450, 2000.

4.   On the design and quantification of privacy preserving data mining algorithms. D. Agrawal and C. C. Aggarwal, In Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Santa Barbara, California, USA, May 21-23 2001. ACM.

5.   Privacy Preserving Indexing of Documents on the Network, Mayank Bawa, Roberto Bayardo Jr., and Rakesh Agrawal. In VLDB, 2003

6.   Topk Queries Across Multiple Private Databases, L. Xiong, S. Chitti, L. Liu. In Proc of Intl Conf of Distributed Computing Systems ICDCS, 2005

Information Hiding -- A Survey, F. A. P. Petitcolas, R. J. Anderson, M. G. Kuhn. In Proc of IEEE Special Issue on Protection of MultiMedia Content 87(7):1062-1078, July 1999

7.   Mining Multiple Private Databases using a kNN Classifier. L. Xiong, S. Chitti, L. Liu. In ACM Annual Symposium of Applied Computing (SAC), Data Mining Track, Seoul, Korea, March, 2007

8.   Towards Attack-Resilient Geometric Data Perturbation, Keke Chen and Ling Liu. Proceedings of the 7th SIAM (Society for Industrial and Applied Mathematics) International Conference on Data Ming (SDM 2007), to be held in Minneapolis, Minnesota, April 26-28, 2007.

9.   A Random Rotation Perturbation Approach to Privacy Preserving Data Classification, Keke Chen and Ling Liu. Proceedings of the Third IEEE International Conference on Data Mining (ICDM'05), New Orleans, Louisiana, U.S.A., November 27-30, 2005. (full paper).

10. Protecting Privacy when Disclosing Information: k-Anonymity and its Enforcement Through Generalization and Specialization. P. Samarati, L. Sweeney, TechReport SRI-CSL-98-04, SRI Intl., 1998.


9. Workflow Management

1. Processing Mining, Discovery and Integration using Distance Measures, Joonsoo Bae, Ling Liu, James Caverlee and William Rouse.Proceedings of IEEE Int. Conf. on Web Services. to be held in Chicago, USA, Sept 18-22.

2. .POESIA: An Ontological Workflow Approach for Composing Web Services in Agriculture, Renato Fileto, Ling Liu, Calton Pu, Claudia Bauzer Medeiros, Eduardo Delgado Assad, International Journal of Very Large Database Systems, 12(4): 352-367 (2003). Special issue on Semantic Web, Guest Editors: Vijay Atluri, Anupam Joshi, Yelena Yesha.

3. A Systematic Approach to Flexible Specification, Composition, and Restructuring of Workflow Activities, Ling Liu, Calton Pu, Duncan Dubugras Ruiz, In Journal of Database Management, Vol. 15, No.1,Jan/March, 2004. pp1-40.

4. Dynamic restructuring of Transactional Workflow Activities : A Practical Implementation Method, Tong Zhou, Calton Pu, Ling Liu, In the Seventh International Conference on Information and Knowledge Management (CIKM'98), November 3-7, 1998, Washington D.C., USA pp378-385.

5. Methodical Restructuring of Complex Workflow Activities. Ling Liu and Calton Pu. IEEE 14th International Conference on Data Engineering, February 23-27, 1998, Orlando, Florida, USA. pp342-350.

6.ActivityFlow: Towards Incremental Sepcification and Flexible Coordination of Workflow Activities, Ling Liu and Calton Pu, In: The 16th International Conference on Conceptual Modeling (ER'97) , Los Angeles, California, USA (3 - 6 November 1997). pp169-182.


10. Role Based Access Control

1.      Role Based Access Control, D.F. Ferraiolo and D.R. Kuhn (1992) ,15th National Computer Security

2.      Role Based Access Control: Features and Motivations, D.F. Ferraiolo, J. Cugini, D.R. Kuhn, Computer Security Applications Conference  - extends the 1992 model

3.      An Introduction to Role Based Access Control NIST CSL Bulletin on RBAC (December, 1995)  

4.      Formal Specification for Role Based Access Control User/Role and Role/Role Relationship Management, S. Gavrila, J. Barkley, Third ACM Workshop on Role-Based Access Control.  

5.      Role Based Access Control , D.F. Ferraiolo, D.R. Kuhn, R. Chandramouli, Artech House, 2003.

6.      Mutual Exclusion of Roles as a Means of Implementing Separation of Duty in Role-Based Access Control Systems, D.R. Kuhn, Second ACM Workshop on Role-Based Access Control. 1997 

7.      Role Based Access Control on MLS Systems Without Kernel Changes, D.R. Kuhn,Third ACM Workshop on Role Based Access Control, October 22-23,1998.

8.      Supporting Relationships in Access Control using Role Based Access Control , J. Barkley, C. Beznosov, Uppal, Fourth ACM Workshop on Role-Based Access Control (1999).  

9.      Managing Role/Permission Relationships Using Object Access Types, J.F. Barkley, A.V. Cincotta, Third ACM Workshop on Role Based Access Control (1998).  

10.  A Resource Access Decision Service for CORBA-based Distributed Systems, Beznosov, Deng, Blakley, Burt, Barkley, ACSAC (Annual Computer Security Applications Conference) 1999. 

11.  The Economic Impact of Role Based Access Control.  Research Triangle Institute.  NIST Planning Report 02-01. 2002  

12.  Comparing Simple Role Based Access Control Models and Access Control Lists, J. Barkley, (1997), Second ACM Workshop on Role-Based Access Control.

13.  The NIST Model for Role Based Access Control: Towards a Unified Standard, R. Sandhu, D. Ferraiolo, R. Kuhn, Proceedings, 5th ACM Workshop on Role Based Access Control, July 26-27, 2000. 

14.  Role Based Access Control Features in Commercial Database Management Systems, R. Chandramouli, R. Sandhu, 21st National Information Systems Security Conference, October 6-9, 1998, Crystal City, Virginia. 

15.  Inheritance Properties of Role Hierarchies, W.A. Jansen, 21st National Information Systems Security Conference, October 6-9, 1998, Crystal City, Virginia

16.  Business Process Driven Framework for defining an Access Control Service based on Roles and Rules, R. Chandramouli, 23rd National Information Systems Security Conference, 2000. 

17.  A Revised Model for Role Based Access Control, W.A. Jansen, NIST-IR 6192, July 9, 1998

18.  Role-Based Access Control Models, R. S. Sandhu, E.J. Coyne, H.L. Feinstein, C.E. Youman, IEEE Computer 29(2): 38-47, IEEE Press, 1996

19.  A Proposed Standard for Role Based Access Control, D. Ferraiolo, R. Sandhu, S. Gavrila, D.R. Kuhn, R. Chandramouli. ACM Transactions on Information and System Security , vol. 4, no. 3 (August, 2001) - draft of a consensus standard for RBAC.

20.  Implementing Role Based Access Control Using Object Technology, J. Barkley,First ACM Workshop on Role-Based Access Control (1995).  

21.  Role Based Access Control (book), D.F. Ferraiolo, D.R. Kuhn, R. Chandramouli, Artech House, 2003.

22.  Object Retrieval and Access Management in Electronic Commerce, S. Wakid, J.F. Barkley, M.Skall,IEEE Communications Magazine, September 1999.

23.  A Marketing Survey of Civil Federal Government Organizations to Determine the Need for RBAC Security Product, (SETA Corporation, 1996).

24.  Efficient and Secure Search of Enterprise File Systems, Aameek Singh, Mudhakar Srivatsa, Ling Liu. Proceedings of IEEE International Conference on Web Services (ICWS 2007), July 9-13, 2007, Salt Lake City, Utah, USA.

25.  Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems, ROBERT GRIMM and BRIAN N. BERSHAD. ACM Transactions on Computer Systems, Vol. 19, No. 1, February 2001, Pages 36~70.


11. Data Warehouse and OLAP

1.      Lineage Tracing for General Data Warehouse Transformations, Yingwei Cui and Jennifer Widom, VLDB, 2001.

2.      Adapting Materialized Views After Redefinitions: Techniques and a Performance Study, A. Gupta, I. S. Mumick, J. Rao, and K. A. Ross,    Information Systems, 2001 (Special issue on Data Warehousing).

3.      Edited synoptic cloud reports from ships and land stations over the globe (1982-1991), C. Hahn, S. Warren, and J. London,2001.

4.      The UCI KDD archive, S. Hettich and S. D. Bay. University of California, Irvine, 2000.

5.      Olap over uncertain and imprecise data, Doug Burdick, Prasad M. Deshpande, T. S. Jayram, Raghu Ramakrishnan, and Shivakumar  Vaithyanathan, The VLDB Journal, 16:1, 123b144, 2006.

6.      Olap solutions: building multidimensional information systems second edition, Erik Thomsen, 2002.

7.      Encoded Bitmap Indexing for Data Warehouses, M.C. Wu and A.P. Buchmann, ICDE, 220-230, 1998.

8.      An Alternative Storage Organization for ROLAP Aggregate Views Based on Cubetrees, Yannis Kotidis and Nick Roussopoulos, ACM SIGMOD, 249b258, 1998.

9.      DocCube: multi-dimensional visualization and exploration of large document sets, Josiane Mothe, Claude Chrisment, Bernard Dousset, and Joel Alaux,Journal of the American Society for Information Science and Technology, 54:7, 650b659, 2003.

10.  On the design and evaluation of a multi-dimensional approach to information retrieval, M. C. McCabe, J. Lee, A. Chowdhury, D. Grossman, and O. Frieder, SIGIR '00, 363b365, 2000.

11.  Modeling, querying and reasoning about olap databases: a functional approach, Ken Q. Pu, DOLAP, 1-8, 2005.

12.  Reconsidering multi-dimensional schemas, Tim Martyn, SIGMOD Rec., 33:1, 83b88, 2004.

13.  Privacy preservation for data cubes, Sam Y. Sung, Yao Liu, Hui Xiong, and Peter A. Ng, Knowledge and Information Systems, 9:1, 38b61, 2006.

14.  A Temporal Query Language for OLAP: Implementation and a Case Study, Alejandro Vaisman and Alberto Mendelzon, DBPL, 2001.

15.  Intelligent rollups in multidimensional OLAP data, Gayatri Sathe and Sunita Sarawagi, The VLDB Journal, 531-540, 2001.

16.  Serving Datacube Tuples from Main Memory, K. A. Ross and K. A. Zaman, SSDBM, 2000.

17.  Hybrid Query and Data Ordering for Fast and Progressive Range-Aggregate Query Answering, International Journal of Data Warehousing and Mining, Cyrus Shahabi, Mehrdad Jahangiri, and Dimitris Sacharidis,2005.

18.  Space-efficient cubes for olap range-sum queries, Decis. Support Syst, Seok-Ju Chun, Chin-Wan Chung, and Seok-Lyong Lee, 37:1, 83b102, 2004.

19.  pCube: update-efficient online aggregation with progressive feedback and error bounds, Mirek Riedewald, Divyakant Agrawal, and Amr El Abbadi, SSDM, 95-108, 2000

20.  MM-Cubing: computing iceberg cubes by factorizing the lattice space, Zheng Shao, Jiawei Han, and Dong Xin, Proceedings of the 16th International Conference on Scientific and Statitistical Database Management (SSDBM), 2004.

21.  The cgmCUBE project: Optimizing parallel data cube generation for ROLAP, Frank Dehne, Todd Eavis, and Andrew Rau-Chaplin, Distributed and Parallel Databases, 19:1, 29b62, 2006.

22.  OLAP: Efficient Parallel Generation and Querying of Terabyte Size ROLAP Data Cubes, Chen, Y., Rau-Chaplin, A., Dehne, F., Eavis, T., Green, D., and Sithirasenan, ICDE'06, 2006.

23.  Evaluation of top-k OLAP queries using aggregate R-trees, N. Mamoulis, S. Bakiras, and P. Kalnis,International Symposium on Spatial and Temporal Databases (SSTD), 2005.

24.  Efficient OLAP Operations for Spatial Data Using Peano Trees, B. Wang, F. Pan, D. Ren, Y. Cui, D. Ding, and W. Perrizo, 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2003.

25.  A Pareto Model for OLAP View Size Estimation, Thomas P. Nadeau and Toby J. Teorey,CASCON, 2001.


12. Social Networks

  1.  Seeking Stable Clusters in the Blogosphere. Bansal, F. Chiang, N. Koudas, and F. Wm. Tompa. VLDB 2007.
  2. Improved Annotation of the Blogosphere via Autotagging and Hierarchical Clustering. C. Brooks and N. Montanez.,WWW 2006.
  3. J. Zhang, M. Ackerman, and L. Adamic. Expertise Networks in Online  Communities: Structure and Algorithms. WWW 2007.
  4. L. Backstrom et al. Group Formation in Large Social Networks: Membership, Growth, and Evolution. KDD 2006.
  5. X. Wu, L. Zhang, and Y. Yu. Exploring Social Annotations for the Semantic Web. WWW 2006.
  6. DeRose, W. Shen, F. Chen, A. Doan, R. Ramakrishnan. Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach. VLDB 2007.
  7. S. Abiteboul and N. Polyzotis. The Data Ring: Community Content Sharing. CIDR 2007.
  8. M. Dubinko et al. Visualizing Tags Over Time. WWW 2006.
  9. K. Lawrence and M.C. Shraefel. Bringing Communities to the Semantic Web and the Semantic Web to Communities. WWW 2006.
  10. Mei, C. Liu, and H. Su. A Probabilistic Approach to Spatiotemporal Theme Pattern Mining on Weblogs. WWW 2006.
  11. B. Aleman-Meza et al. Semantic Analytics on Social Networks: Experiences in Addressing the Problem of Conflict of Interest Detection. WWW 2006
  12. Y. Matsuo, J. Mori, and M. Hamasaki. POLYPHONET: An Advanced Social  Network Extraction System from the Web. WWW 2006.
  13. Li et al. Towards Effective Browsing of Large Scale Social Annotations. WWW 2007.
  14. X. Ni et al. Exploring in the Weblog Space by Detecting Informative and Affective Articles. WWW 2007.
  15. Q. Mei et al. Topic Sentiment Mixture: Modeling Facets and Opinions in Weblogs. WWW 2007.
  16. H. Halpin, V. Robu, and H. Shepherd. The Complex Dynamics of Collaborative Tagging. WWW 2007.
  17. P.-A. Chirita et al. P-TAG: Large Scale Automatic Generation of  Personalized Annotations TAGs for the Web. WWW 2007.
  18. S. Bao et al. Optimizing Web Search Using Social Annotations. WWW  2007.
  19. L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore Art Thou R3579X?  Anonymized Social Networks, Hidden Patterns, and Structural  Steganography. WWW 2007.
  20. Chi et al. Structural and temporal analysis of the blogosphere through community factorization. KDD 2007.
  21. Tantipathananandh et al. A Framework For Community Identification in  Dynamic Social Networks. KDD 2007.
  22. Y. Liu et al. ARSA: A Sentiment-Aware Model for Predicting Sales  Performance Using Blogs. SIGIR 2007.
  23. IEEE Data Engineering Bulletin: Special Issue on Data Management Issues in Social Sciences
  24. Golder and B. Huberman. The Structure of Collaborative Tagging  Systems.
  25. J. Freyne et al. Collecting Community Wisdom: Integrating Social  Search and Social Navigation. IUI 2007.
  26. A. Sahuguet, R. Hull, D. F. Lieuwen, and M. Xiong. Enter once, share everywhere: User profile management in converged networks. International Conference on Innovative Data Systems Research, 2003.
  27. James Caverlee and Ling Liu. Tamper Resilient Trust Establishment in Online Social Networks. Technical Report, Georgia Institute of Technology, School of Computer Science. July 2007.
  28. James Caverlee. Tamper-Resilient Methods for Intelligent Information Systems. PhD Dissertation, June 2007, Georgia Institute of Technology.
  29. Aleksandra Korolova, Rajeev Motwani, and Shubha U. Nabar. Link Privacy in Social Networks. CIKM'08, October 26-30, 2008, Napa Valley, California, USA.
  30. Jure Leskovec, Daniel Huttenlocher, Jon Kleinberg. Predicting Positive and Negative Links in Online Social Networks. WWW 2010.
  31. Jure Leskovec, Kevin Lang, Michael Mahoney. Empirical Comparison of Algorithms for Network Community Detection. WWW 2010.
  32. Rongjing Xiang, Jennifer Neville, Monica Rogati. Modeling Relationship Strength in Online Social Network. WWW2010.
  33. Jennifer Neville, Timothy La Fond. Randomization Tests for Distinguishing Social Influence and Homophily Effects. WWW 2010.
  34. Yue Lu, Panayiotis Tsaparas, Alex  Ntoulas, Livia  Polanyi. Exploiting Social Context for Review Quality Prediction. WWW2010.
  35. Arun Maiya, Tanya Berger-Wolf. Sampling Community Structure. WWW 2010.
  36. Kristina Lerman, Tad Hogg. Using a Model of Social Dynamics to Predict Popularity of News. WWW 2010
  37. Alessandra Sala, Lili Cao, Christo Wilson, Robert Zablit, Haitao Zheng, Ben Zhao. Measurement-calibrated Graph Models for Social Network Experiments. WWW2010.
  38. Kempe, David, Jon Kleinberg, and Éva Tardos. "Maximizing the spread of influence through a social network." In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 137-146. ACM, 2003.
  39. Liben-Nowell, David, and Jon Kleinberg. "The link-prediction problem for social networks." Journal of the American society for information science and technology 58, no. 7 (2007): 1019-1031.


13. Data Storage and Indexing

  1. Bayer, Rudolf, and Edward M. McCreight. "Organization and maintenance of large ordered indexes." Acta informatica 1, no. 3 (1972): 173-189.
  2. Comer, Douglas. "Ubiquitous B-tree." ACM Computing Surveys (CSUR) 11, no. 2 (1979): 121-137.
  3. Fagin, Ronald, Jurg Nievergelt, Nicholas Pippenger, and H. Raymond Strong. "Extendible hashing?a fast access method for dynamic files." ACM Transactions on Database Systems (TODS) 4, no. 3 (1979): 315-344.
  4. Nievergelt, J?rg, Hans Hinterberger, and Kenneth C. Sevcik. "The grid file: An adaptable, symmetric multikey file structure." ACM Transactions on Database Systems (TODS) 9, no. 1 (1984): 38-71.
  5. Antonin Guttman. "R-trees: a dynamic index structure for spatial searching." In Proceedings of the 1984 ACM SIGMOD international conference on Management of data(SIGMOD '84). ACM, New York, NY, USA, 47-57.
  6. Sellis, Timos K., Nick Roussopoulos, and Christos Faloutsos. "The R+-Tree: A Dynamic Index for Multi-Dimensional Objects." In Proceedings of the 13th International Conference on Very Large Data Bases, pp. 507-518. Morgan Kaufmann Publishers Inc., 1987.
  7. Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. "The R*-tree: an efficient and robust access method for points and rectangles." In Proceedings of the 1990 ACM SIGMOD international conference on Management of data (SIGMOD '90). ACM, New York, NY, USA, 322-331.
  8. Bentley, Jon Louis. "Multidimensional binary search trees used for associative searching." Communications of the ACM 18, no. 9 (1975): 509-517.
  9. Zhang, Gong, Lawrence Chiu, and Ling Liu. "Adaptive Data Migration in Multi-tiered Storage Based Cloud Environment." In Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on, pp. 148-155. IEEE, 2010.
  10. Sivathanu, Sankaran, Ling Liu, Mei Yiduo, and Xing Pu. "Storage management in virtualized cloud environment." In Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on, pp. 204-211. IEEE, 2010.


14. Relational Query Optimization

  1. Selinger, P. Griffiths, Morton M. Astrahan, Donald D. Chamberlin, Raymond A. Lorie, and Thomas G. Price. "Access path selection in a relational database management system." In Proceedings of the 1979 ACM SIGMOD international conference on Management of data, pp. 23-34. ACM, 1979.
  2. Jarke, Matthias, and Jurgen Koch. "Query optimization in database systems." ACM Computing surveys (CsUR) 16, no. 2 (1984): 111-152.
  3. Chaudhuri, Surajit. "An overview of query optimization in relational systems." In Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, pp. 34-43. ACM, 1998.
  4. Haas, Peter J., Jeffrey F. Naughton, S. Seshadri, and Lynne Stokes. "Sampling-Based Estimation of the Number of Distinct Values of an Attribute." In Proceedings of the 21th International Conference on Very Large Data Bases, pp. 311-322. Morgan Kaufmann Publishers Inc., 1995.
  5. Seshadri, Sangeetha, Vibhore Kumar, Brian Cooper, and Ling Liu. "A Distributed Stream Query Optimization Framework through Integrated Planning and Deployment." Parallel and Distributed Systems, IEEE Transactions on 20, no. 10 (2009): 1439-1453.


15. Big Data Processing in the Cloud

  1. Chang, Fay, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. "Bigtable: A distributed storage system for structured data." ACM Transactions on Computer Systems (TOCS) 26, no. 2 (2008): 4.
  2. DeCandia, Giuseppe, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. "Dynamo: amazon's highly available key-value store." In ACM SIGOPS Operating Systems Review, vol. 41, no. 6, pp. 205-220. ACM, 2007.
  3. The Apache Software Foundation. "ZooKeeper: A Distributed Coordination Service for Distributed Applications." 2008.
  4. George, Lars. HBase: The Definitive Guide. O'Reilly Media, 2011.
  5. O’Neil, Patrick, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. "The log-structured merge-tree (LSM-tree)." Acta Informatica 33, no. 4 (1996): 351-385.
  6. Idreos, Stratos, Martin L. Kersten, and Stefan Manegold. "Updating a cracked database." In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 413-424. ACM, 2007.
  7. Graefe, Goetz. "Write-optimized b-trees." In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30, pp. 672-683. VLDB Endowment, 2004.
  8. Brantner, Matthias, Daniela Florescu, David Graf, Donald Kossmann, and Tim Kraska. "Building a database on S3." In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 251-264. ACM, 2008.
  9. Baker, Jason, Chris Bond, James C. Corbett, J. J. Furman, Andrey Khorlin, James Larson, Jean-Michel Léon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh. "Megastore: Providing scalable, highly available storage for interactive services." In Proc. of CIDR, pp. 223-234. 2011.
  10. Armbrust, Michael, Kristal Curtis, Tim Kraska, Armando Fox, Michael J. Franklin, and David A. Patterson. "PIQL: Success-tolerant query processing in the cloud." Proceedings of the VLDB Endowment 5, no. 3 (2011): 181-192.
  11. Ousterhout, John, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra et al. "The case for RAMClouds: scalable high-performance storage entirely in DRAM." ACM SIGOPS Operating Systems Review 43, no. 4 (2010): 92-105.
  12. Stonebraker, Michael, Samuel Madden, Daniel J. Abadi, Stavros Harizopoulos, Nabil Hachem, and Pat Helland. "The end of an architectural era:(it's time for a complete rewrite)." In Proceedings of the 33rd international conference on Very large data bases, pp. 1150-1160. VLDB Endowment, 2007.
  13. Brewer, Eric A. "Towards robust distributed systems." In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing, vol. 19, pp. 7-10. 2000.
  14. Abadi, Daniel. "Consistency tradeoffs in modern distributed database system design: CAP is only part of the story." Computer 45, no. 2 (2012): 37-42.
  15. Yabandeh, Maysam, and Daniel Gómez Ferro. "A critique of snapshot isolation." In Proceedings of the 7th ACM european conference on Computer Systems, pp. 155-168. ACM, 2012.
  16. Peng, Daniel, and Frank Dabek. "Large-scale incremental processing using distributed transactions and notifications." In Proceedings of the 9th USENIX conference on Operating systems design and implementation, pp. 1-15. USENIX Association, 2010.
  17. Glendenning, Lisa, Ivan Beschastnikh, Arvind Krishnamurthy, and Thomas Anderson. "Scalable consistency in Scatter." In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 15-28. ACM, 2011.
  18. Chu, Cheng, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. "Map-reduce for machine learning on multicore." Advances in neural information processing systems 19 (2007): 281.
  19. Busch, Michael, Krishna Gade, Brian Larson, Patrick Lok, Samuel Luckenbill, and Jimmy Lin. "Earlybird: Real-time search at Twitter." In Data Engineering (ICDE), 2012 IEEE 28th International Conference on, pp. 1360-1369. IEEE, 2012.
  20. Cheng, Raymond, Ji Hong, Aapo Kyrola, Youshan Miao, Xuetian Weng, Ming Wu, Fan Yang, Lidong Zhou, Feng Zhao, and Enhong Chen. "Kineograph: taking the pulse of a fast-changing and connected world." In Proceedings of the 7th ACM european conference on Computer Systems, pp. 85-98. ACM, 2012.
  21. Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51, no. 1 (2008): 107-113.
  22. Schmuck, Frank, and Roger Haskin. "GPFS: A shared-disk file system for large computing clusters." In Proceedings of the First USENIX Conference on File and Storage Technologies, pp. 231-244. 2002.
  23. Thekkath, Chandramohan A., Timothy Mann, and Edward K. Lee. "Frangipani: A scalable distributed file system." In ACM SIGOPS Operating Systems Review, vol. 31, no. 5, pp. 224-237. ACM, 1997.
  24. Ross, Robert B., and Rajeev Thakur. "PVFS: A parallel file system for Linux clusters." In In Proceedings of the 4th Annual Linux Showcase and Conference, pp. 391-430. 2000.
  25. Pavlo, Andrew, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. "A comparison of approaches to large-scale data analysis." In Proceedings of the 35th SIGMOD international conference on Management of data, pp. 165-178. ACM, 2009.
  26. Chu, Cheng, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. "Map-reduce for machine learning on multicore." Advances in neural information processing systems 19 (2007): 281.
  27. Abouzeid, Azza, Kamil Bajda-Pawlikowski, Daniel Abadi, Avi Silberschatz, and Alexander Rasin. "HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads." Proceedings of the VLDB Endowment 2, no. 1 (2009): 922-933.
  28. Bajda-Pawlikowski, Kamil, Daniel J. Abadi, Avi Silberschatz, and Erik Paulson. "Efficient processing of data warehousing queries in a split execution environment." In Proceedings of the 2011 international conference on Management of data, pp. 1165-1176. ACM, 2011.
  29. Okcan, Alper, and Mirek Riedewald. "Processing theta-joins using MapReduce." In Proceedings of the 2011 international conference on Management of data, pp. 949-960. ACM, 2011.
  30. Yang, Christopher, Christine Yen, Ceryen Tan, and Samuel R. Madden. "Osprey: Implementing MapReduce-style fault tolerance in a shared-nothing distributed database." In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pp. 657-668. IEEE, 2010.
  31. Yang, Hung-chih, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. "Map-reduce-merge: simplified relational data processing on large clusters." In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 1029-1040. ACM, 2007.
  32. Thusoo, Ashish, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu, and Raghotham Murthy. "Hive-a petabyte scale data warehouse using hadoop." In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pp. 996-1005. IEEE, 2010.
  33. Blanas, Spyros, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, and Yuanyuan Tian. "A comparison of join algorithms for log processing in mapreduce." In Proceedings of the 2010 international conference on Management of data, pp. 975-986. ACM, 2010.
  34. Vernica, Rares, Michael J. Carey, and Chen Li. "Efficient parallel set-similarity joins using MapReduce." In SIGMOD conference, pp. 495-506. 2010.
  35. Afrati, Foto N., and Jeffrey D. Ullman. "Optimizing joins in a map-reduce environment." In Proceedings of the 13th International Conference on Extending Database Technology, pp. 99-110. ACM, 2010.


16. RDF Data Management and Column Store

  1. Abadi, Daniel J., Adam Marcus, Samuel R. Madden, and Kate Hollenbach. "Scalable semantic web data management using vertical partitioning." In Proceedings of the 33rd international conference on Very large data bases, pp. 411-422. VLDB Endowment, 2007.
  2. Weiss, Cathrin, Panagiotis Karras, and Abraham Bernstein. "Hexastore: sextuple indexing for semantic web data management." Proceedings of the VLDB Endowment 1, no. 1 (2008): 1008-1019.
  3. Sidirourgos, Lefteris, Romulo Goncalves, Martin Kersten, Niels Nes, and Stefan Manegold. "Column-store support for RDF data management: not all swans are white." Proceedings of the VLDB Endowment 1, no. 2 (2008): 1553-1563.
  4. Neumann, Thomas, and Gerhard Weikum. "The RDF-3X engine for scalable management of RDF data." The VLDB Journal—The International Journal on Very Large Data Bases 19, no. 1 (2010): 91-113.
  5. Atre, Medha, Vineet Chaoji, Mohammed J. Zaki, and James A. Hendler. "Matrix Bit loaded: a scalable lightweight join query processor for RDF data." In Proceedings of the 19th international conference on World wide web, pp. 41-50. ACM, 2010.
  6. Boncz, Peter A., Martin L. Kersten, and Stefan Manegold. "Breaking the memory wall in MonetDB." Communications of the ACM 51, no. 12 (2008): 77-85.
  7. Ivanova, Milena G., Martin L. Kersten, Niels J. Nes, and Romulo AP Gonçalves. "An architecture for recycling intermediates in a column-store." ACM Transactions on Database Systems (TODS) 35, no. 4 (2010): 24.
  8. Broekstra, Jeen, Arjohn Kampman, and Frank Van Harmelen. "Sesame: A generic architecture for storing and querying rdf and rdf schema." The Semantic Web—ISWC 2002 (2002): 54-68.
  9. Stonebraker, Mike, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau et al. "C-store: a column-oriented DBMS." In Proceedings of the 31st international conference on Very large data bases, pp. 553-564. VLDB Endowment, 2005.


17. Peer to Peer Computing

1.    Analysis of the Evolution of Peer-to-Peer Systems. David Liben-Nowell, Hari Balakrishnan, and David Karger, ACM Conf. on Principles of Distributed Computing (PODC), Monterey, CA, July 2002.

2.     Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications. Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M. Frans Kaashoek, Frank Dabek, Hari Balakrishnan, IEEE/ACM Transactions on Networking

3.    Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility A. Rowstron and P. Druschel,18th ACM SOSP'01, Lake Louise, Alberta, Canada, October 2001.

4.    SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network,  Yan Chen, Randy H. Katz, and John D. Kubatowicz in Proceedings of the International Conference on Pervasive Computing, August 2002.

5.    PeerCQ: A Decentralized and Self-Configuring Peer-to-Peer Information Monitoring System.Bugra Gedik and Ling Liu. The 23rd International Conference on Distributed Computing Systems. (ICDCS 2003)

6.    Search and replication in unstructured peer-to-peer networks, Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker. In the Proceedings of the 16th international conference on Supercomputing, June 2002 , New York, USA.

7.    "Freenet: A Distributed Anonymous Information Storage and Retrieval System" Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong, in Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability, LNCS 2009, ed. by Hannes Federrath. Springer: New York (2001).

8.    "Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems". A. Rowstron and P. Druschel, IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pages 329-350, November, 2001.

9.    Anonymous Publish/Subscribe in P2P Networks,A. K. Datta, M. Gradinariu, M. Raynal and G. Simon, IPDPS 2003.

10.    "SCRIBE: A large-scale and decentralised application-level multicast infrastructure",M. Castro, P. Druschel, A-M. Kermarrec and A. Rowstron, IEEE Journal on Selected Areas in Communications (JSAC) (Special issue on Network Support for Multicast Communications). 2002.

11.    Scalable application layer multicast. S. Banerjee, B. Bhattacharjee, and C. Kommareddy, In Proceedings of the 2002 ACM SIGCOMM Conference, 2002.

12.    Enabling Conferencing Applications on the Internet using an Overlay Multicast Architecture.Yang-hua Chu, Sanjay G. Rao, Srinivasan Seshan and Hui Zhang, In Proceedings of ACM SIGCOMM 2001.

13.    A Scalable Content Addressable Network. S. Ratnasamy, P. Francis, M. Handley, R. Karp, and S. Shenker. In Proceedings of the ACM SIGCOMM Conference, 2001.

14.    Gnutella RFC

15.    Making Gnutella-like P2P Systems Scalable. Y. Chawathe, S. Ratnasamy, L. Breslau, N. Lanham, and S. Shenker. In Proceedings of the ACM SIGCOMM, 2003.

16.    Phenix: Supporting Resilient Low-diameter Peer-to-Peer Topologies. R. H. Wouhaybi and A. T. Campbell. In Proceedings of IEEE INFOCOM 2004.

17.   Improving Search in Peer-to-Peer Systems. Beverly Yang, Hector Garcia-Molina, In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS), 2002

18.  "Designing a Super-peer Network." Beverly Yang, Hector Garcia-Molina, In Proceedings of the 19th International Conference on Data Engineering (ICDE), Bangalore, India, March 2003

19.  "Routing Indices For Peer-to-Peer Systems." Arturo Crespo, Hector Garcia-Molina. Proceedings of the International Conference on Distributed Computing Systems (ICDCS). July 2002.

20.  Turning Heterogeneity into an Advantage in Overlay Routing. Z. Xu, M. Mahalingam, and M. Karlsson. In Proceedings of INFOCOM, 2003.

21.    Apoidea: A Decentralized Peer-to-Peer Architecture for Crawling the World Wide Web,A. Singh, M. Srivatsa, L. Liu and T. Miller, In the proceedings of the SIGIR workshop on distributed information retrieval, August 2003. Also in Lecture notes of computer science (LNCS) series, Springer Verlag.

22. Tracing a large-scale Peer to Peer System: an hour in the life of Gnutella, Evangelos P. Markatos, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2002.

23. Mapping the gnutella network. M. Ripeanu, I. Foster, and A. Iamnitchi. IEEE Internet Computing Journal, 6(1), 2002.

24. Scaling Unstructured Peer-to-Peer Networks with Heterogeneity-Aware Topology and Routing. M. Srivatsa and L. Liu. in the Proceedings of IEEE Transactions on Parallel and Distributed Systems.

25. Contructing a Proximity-aware Power Law Overlay Network . J. Zhang, L. Liu and C. Pu. In the Proceedings of IEEE Global Telecommunications Conference GLOBECOM, 2005.

26. An Analysis of Internet Content Delivery Systems. Stefan Saroiu, Krishna P. Gummadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy. Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI 2002), December 2002.

27. On the Feasibility of Peer-to-Peer Web Indexing and Search, Jinyang Li, Boon Thau, Loo Joseph, M. Hellerstein, M. Frans Kaashoek, LECTURE NOTES IN COMPUTER SCIENCE – 2003, ISSU 2735, pages 207-215.

28. P2P Content Search: Give the Web Back to the People, Matthias Bender, Sebastian Michel, Peter Triantafillou, GerhardWeikum, Christian Zimmer, Proceedings of the 5th International Workshop on Peer-to-Peer Systems, 2006

29. MINERVA∞ A scalable efficient peer-to-peer search engine, S Michel, P Triantafillou, G Weikum - LECTURE NOTES IN  COMPUTER SCIENCE, 2005.

30. Routing indices for peer-to-peer systems, A Crespo, H Garcia-Molina, In Proceedings of Distributed Computing Systems, 2002.    

31.An Analysis of the Skype Peer-to-Peer Internet Telephony Protocol, S.A. Baset and H.G. Schulzrinne, Infocom 2006, April 2006.

32. An Experimental Study of the Skype Peer-to-Peer VoIP System, Saikat Guha, Neil Daswani, Ravi Jain, IPTPS 06, February 2006.

33. Characterizing and Detecting Skype-Relayed Traffic, K. Suh, D. R. Figueiredo, J. Kurose, D. Towsley, Infocom 2006, April 2006.

34. Rarest First and Choke Algorithms are Enough, Arnaud Legout, G. Urvoy-Keller, P. Michiardi, IMC 2006.

35. The Bittorrent P2P File-sharing System: Measurements and Analysis, J.A Pouwelse, P. Garbacki, D.H.J Epema, H.J. Sips, IPTPS 05, February 2005.

36. Incentives Build Robustness in BitTorrent, Bram Cohen, First Workshop on Economics of Peer-to-peer Systems, June 2003.


Open Source SW

1. Search Engines

  1. Apache Hadoop
  2. Apache Lucene
  3. PeerCrawl: peer-to-peer Web Crawler
  4. Apoidea: Decentralized P2P Web Crawling

2. Data Store and Database

  1. Apache HBase
  2. Apache Cassandra
  3. Project Voldemort
  4. MySQL
  5. Apache Hive

3. Stream Processing

  1. Apache Flume
  2. Apache Kafka
  3. Scribe

4. Data Mining

  1. Weka 3: Data Mining Software in Java
  2. Apache Mahout
  3. PEGASUS: Peta-Scale Graph Mining System

5. RDF Store

  1. RDF-3X
  2. Apache Jena
  3. Sesame
  4. Mulgara
  5. Virtuoso Open-Source

6. Graph System

  1. METIS
  2. GPS: A Graph Processing System
  3. GraphChi

 

Related Courses at other Universities

1. Stanford-CS345: Advanced Topics in Database Systems, Instructor: Anand Rajaraman, Jeffrey D. Ullman

2. MIT-6.830: Database Systems, Instructor: Professor Samuel Madden

3. Cornell-
CS633: Advanced Database Systems, Instructor: Jayavel Shanmugasundaram



[Link to GT]

Last updated on Jan. 9, 2013 by Ling Liu