CS4440 Emerging Database Technologies

Instructor: Professor Ling Liu

                               Course Readings

Attention: The information contained in this page is subject to changes.


| Requirement | Required Readings | General/Recommended Readings | Reading Summary Posting |


Reading Summary Requirement

There will be several background readings assigned each week. The readings will either be handed out a week before or listed on the Web page for required readings.

Homework/Assignment:

You are expected to read the material each week and write 2-3 paragraphs per reading giving your impressions and thoughts. The summaries should be informal and brief, and should consist of your own comments on the readings, NOT a rehash of the content.

You should email your summaries to TA, preferably before each class but no later than 11:59 pm on Friday each week (unless there is no reading assignments for the week). Late assignment will NOT be accepted unless approved in advance by the instructor.

Reading Summary Guidelines:

The summary for each reading assignment is expected to consist of 1 paragraph on each of the following three aspects: (1) the positive aspect of the paper; (2) the negative aspect of the paper; and (3) a brief discussion on how the idea or method proposed or used in evaluation may be applied to your own project for the course.

You may want to keep these guidelines in mind when reading papers.

You may find the following short article helpful:

Efficient Reading of Papers in Science and Technology By Michael J. Hanson and updated by D. McNamee


Areas of Readings

1. Mobile Database Management

2. Spatial Indexing Techniques

3. Data Clustering Algorithms

4. Stream databases

5. RFID data management

6. Web Search and Web IR

 

7. Data Mining

 

8. Privacy Preserving Data Mining

 

9. Workflow Management

 

10. Role based Access Control

 

11. Data Warehouse and OLAP

 


Required Readings and Dates

You are expected to read papers in the required reading list, but only write summary for one paper selected from the list of 2-3 required readings associated with each lecture. Please use the Summary Template to write the reading summaries.

NOTE: Most of the papers listed below are from ACM or IEEE conferences or Journals. Online proceedings can be accessible from the ACM /IEEE online library link provided by GT library. Your GT ID/Password are required to access the online library.

http://www.library.gatech.edu/research_help/subject/index.php?/computer_science/conferences


General/Recommended Course Reading List

1. Mobile Database Management

1.      MobiEyes: A Distributed Location Monitoring Service Using Moving Location Queries. Bugra Gedik and Ling Liu. IEEE Transactions on Mobile Computing. Vol. 5, No. 10, pp. 1384-1402, October 2006.

2.      Map-matching: Towards Improving Wireless Positioning, Kipp Jones and Ling Liu. to appear in Proceedings of the 4th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (Mobiquitous 2007). August 6-10, 2007, Philadelphia, PA.

3.      A SpatioTemporal Placement Model for Caching Location Dependent Queries, Anand Murugappan and Ling Liu.Proceedings of the 4th Annual International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services (Mobiquitous 2007). August 6-10, 2007, Philadelphia, PA.

4.       Lira: Lightweight, Region-aware Load Shedding in Mobile CQ Systems. Bugra Gedik, Ling Liu, Kun-Lung Wu, Philip S. Yu. Proceedings of the IEEE 23rd International Conference on Data Engineering. Istanbul, Turkey; April 17-20, 2007.

5.      Effective Density Queries on Continuously Moving Objects. Christian S. Jensen, Dan Lin, Beng Chin Ooi, Rui Zhang. ICDE 2006

6.      Christian S. Jensen: Indexing the past, present, and anticipated future positions of moving objects. Mindaugas Pelanis, Simonas Saltenis, ACM Trans. Database Syst. 31(1): 255-298 (2006)

7.      Fast Nearest Neighbor Search on Road Networks. Hu, H., Lee, D.L., and Xu, J. Proceedings of the International Conference on Extending Database Technology (EDBT 2006), Munich, Germany, Mar 2006, 186-203.

8.      Distance Indexing on Road Networks. Hu, H., Lee, D.L., and Lee, V.C.S. Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006), Seoul, Korea, Sept 2006, 894-905.

9.      Change Tolerant Indexing for Constantly Evolving Data. Reynold Cheng, Yuni Xia, Sunil Prabhakar, Rahul Shah: ICDE 2005: 391-402 

10.  Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data. Reynold Cheng, Yuni Xia, Sunil Prabhakar, Rahul Shah, Jeffrey Scott Vitter: VLDB 2004 : 876-887

11.  Trajectory pattern mining, Fosca Giannotti, Mirco Nanni, Fabio Pinelli, Dino Pedreschi. Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining KDD '07.

12.   Project lachesis: Parsing and modeling location histories, in: GIScience, 2004Hariharan, Toyama

13.  Extracting places from traces of locations. Kang, Welbourne, Stewart, Borriello. In Proc. WMASH, pages 110--118, New York, NY, USA, 2004.

14.  Monitoring Top-k Query in Wireless Sensor Networks, Xu, Wu, Tang, Lee. Proc. the 22nd IEEE Int. Conf. on Data Engineering (ICDE '06), Atlanta, GA, April 2006.

15.  Geographic Information Science: Defining the Field. David Mark.

16.  A bibliography of temporal, spatial and spatio-temporal data mining research, John F. Roddick , Myra Spiliopoulou , ACM SIGKDD Explorations Newsletter, v.1 n.1, p.34-38, June 1999

17.  Modeling Transportation Routines using Hybrid Dynamic Mixed Networks , Vibhav Gogate, Rina Dechter, Bozhena Bidyuk, James Marca and Craig Rindt, , In 21st Conference on Uncertainty in Artificial Intelligence (UAI), 2005.

18.  Tobler's First Law of Geography: A Big Idea for a Small World? Sui D.Z. Annals of the Association of American Geographers 94 (2), 269b277.

19.  Markovian Models for Sequential Data, Y. Bengio, NEURAL COMPUTING SURVEYS, vol 2 1999, 129~162.

20. Using Geospatial Information in Sensor Networks. John Heidemann. Nirupama Bulusu. USC/Information Sciences Institute. September 20, 2000.

21.  Building Personal Maps from GPS Data. Lin Liao and Donald J. Patterson and Dieter Fox and Henry Kautz.

22.  Processing Window Queries in Wireless Sensor Networks, Y. Xu, W.-C. Lee, J. Xu, and G. Mitchel Proc. the 22nd IEEE Int. Conf. on Data Engineering (ICDE '06), Atlanta, GA, April 2006.

23.  Location-Based Activity Recognition using Relational Markov Networks. L. Liao, D. Fox, and H. Kautz. Proc. of the International Joint Conference on Artificial Intelligence (IJCAI-05).

24.  Using GPS to Learn Significant Locations and Predict Movement Across Multiple Users, D. Ashbrook and T. Starner, Personal and Ubiquitous Computing, Vol. 7.5.

25.  Temporal Data Management. C. S. Jensen and R. T. Snodgrass. IEEE TKDE, 11(1): 36--45 (1999).

26.  Learning and Inferring Transporation Routines, Liao, Fox, Kautz, Artificial Intelligence 2007.

27.  Inferring High-Level Behavior from Low-Level Sensors, UBICOMP 2003. ICS 280.

28.  Fundamental Challenges in Mobile Computing, Satyanarayanan, M., Fifteenth ACM Symposium on Principles of Distributed Computing ,   May 1996, Philadelphia, PA, Revised version appeared as: "Mobile Computing: Where's the Tofu?",  Proceedings of the ACM Sigmobile, April 1997, Vol. 1, No. 1.

29.  Multi-Fidelity Algorithms for Interactive Mobile Applications,  Satyanarayanan, M., Narayanan, D. Proceedings of the 3rd International Workshop on Discrete Algorithms and Methods for Mobile Computing and Communications, August 1999, Seattle, WA

30.  Mobile Data Access, Noble, B.School of Computer Science, Carnegie Mellon University, May 1998, CMU-CS-98-118

31.  Energy-aware adaptation for mobile applications, Flinn J., Satyanarayanan, M., Proceedings of the 17th ACM Symposium on Operating Systems Principles, December, 1999, Kiawah Island Resort, SC.

32.  PowerScope: A Tool for Profiling the Energy Usage of Mobile Applications, Flinn J., Satyanarayanan, M., Proceedings of the Second IEEE Workshop on Mobile Computing Systems and Applications, February, 1999, New Orleans, LA

33.  System Support for Mobile, Adaptive Applications, Noble, Brian, IEEE Personal Communications, Vol. 7, No. 1, February, 2000

34.  Experience with adaptive mobile applications in Odyssey , Noble, B.D. and Satyanarayanan, M., Mobile Networks and Applications, Vol. 4, 1999

35.  Agile Application-Aware Adaptation for Mobility, Noble, B., Satyanarayanan, M., Narayanan, D., Tilton, J.E., Flinn, J., Walker, K. Proceedings of the 16th ACM Symposium on Operating System Principles, October 1997, St. Malo, France

36.  A Research Status Report on Adaptation for Mobile Data Access , Noble, B., Satyanarayanan, M. SIGMOD Record, Vol. 24, No. 4, December 1995

37.  A Programming Interface for Application-Aware Adaptation in Mobile Computing , Noble, B., Price, M., Satyanarayanan, M., Proceedings of the Second USENIX Symposium on Mobile & Location-Independent Computing, Apr. 1995, Ann Arbor, MI

38.  Application-Aware Adaptation for Mobile Computing , Satyanarayanan, M., Noble, B., Kumar, P., Price, M.     Proceedings of the 6th ACM SIGOPS European Workshop,  Sep. 1994, Dagstuhl, Germany.

39.  Mobile Information Access, Satyanarayanan, M. , IEEE Personal Communications, Vol. 3, No. 1, February 1996

40.  Indexing Techniques for Power Management in Multi-Attribute Data Broadcast Qinglong Hu, Wang-Chien Lee, and Dik Lun Lee.

41.  Power conserving And access Efficient Indexes For Wireless Computing Dik Lun Lee, and Qinglong Hu,

42.  Power Conservative Multi-Attribute Queries on Data Broadcast, Qinglong Hu, Wang-Chien Lee, and Dik Lun Lee, ICDE 2000.

43.  Effects of power conservation, wireless coverage and cooperation on data dissemination among mobile devices", Maria Papadopouli and Henning Schulzrinne, ACM  SIGMOBILE Symposium on Mobile Ad Hoc Networking & Computing (MobiHoc) 2001, October 4-5, 2001, Long Beach, California. (Extension of the Sarnoff paper.)

44.  Energy-aware Web Caching for Mobile Terminals. Francoise Sailhan, Valrie Issarny. In Proceedings of the ICDCS Workshop on Web Caching Systems. July 2002, Vienna, Austria.

45.  Power-Controlled Data Prefetching/Caching in Wireless Packet Networks, Savvas Gitzenis and Nicholas Bambos, IEEE Infocom 2002, New York.

46.  Sleepers and Workaholics: Caching Strategies in Mobile Environments. Daniel Barbara, Tomasz Imielinski,VLDB Journal 4(4): 567-602(1995).

47.  Indexing techniques for data broadcast on wireless channels. D.L. Lee, Q. Hu, and W. C. Lee,Proceedings of the Fifth International Conference on Foundations of Data Organization (FODO '98), Kobe, Japan, Nov 11-12, 1998, 175-182.

48.  Indexing Techniques for Wireless Data Broadcast Under Data Clustering and Scheduling,Qinglong Hu, Wang-Chien Lee, and Dik Lun Lee, in Proceedings of ACM International Conference on Information and Knowledge Management (CIKM99), Kansas City, Missouri, Nov. 1999, pp. 351-358.

49.  Location Privacy in Pervasive Computing, A. R. Beresford, F. Stajano. In Proc of IEEE Pervasive Computing 46-55, March 2003

50.  Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms . B. Gedik, L. Liu, IEEE Transactions on Mobile Computing, 2008 Jan, 2008 (an extended abs appeared in ICDCS 2005.

51.  Framework for Security and Privacy in Automotive Telematics. S. Duri, M. gruteser, X. Liu, P. Moskowitz, R. Perez, M. Sing, J. M. TangProc of Intl Workshop on Mobile Commerce WMC, 2002.

52.  Anonymous Usage of Location-Based Services Through Spatial and Temporal Cloaking. M. gruteser, D. GrunwaldProc of ACM/USENIX MobiSys, 2003.


2. Spatial Indexing and Spatial Mining Techniques


1.      R-trees: a dynamic index structure for spatial searching. Antonin Guttman , Proceedings of the 1984 ACM SIGMOD international conference on Management of data, June 18-21, 1984, Boston, Massachusetts

2.      Indexing the positions of continuously moving objects. S. Saltenis, C. S. Jensen, S. T. Leutenegger, and M. A.Lopez. In SIGMOD b00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 331b342, New York, NY, USA, 2000. ACM Press.

3.      Voronoi Diagram, Franz Aurenhammer, Rolf Klein1

4.      Spatial Databases: Accomplishments and Research Needs, S. Shekhar, S. Chawla, S. Ravada, A. Fetterer, X. Liu and C.T. Liu, IEEE Transactions on Knowledge and Data Engineering, Jan.-Feb. 1999.

5.     Discovering Spatial Co-location Patterns: a Summary of Results, S. Shekhar and Y. Huang, In Proc. of 7th International
Symposium on Spatial and Temporal Databases (SSTD01), July 2001.

6.      Detecting Graph-based Spatial Outliers: Algorithms and Applications, S. Shekhar, C.T. Lu, P. Zhang, the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001.

7.      Extending Data Mining for Spatial Applications: A Case Study in Predicting Nest Locations, S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, Proc. Int. Confi. on 2000 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2000), Dallas, TX, May 14, 2000.

8.      Modeling Spatial Dependencies for Mining Geospatial Data, S. Chawla, S. Shekhar, W. Wu and U. Ozesmi, First SIAM International Conference on Data Mining, 2001.

9.      Spatial Contextual Classification and Prediction Models for Mining Geospatial Data, S. Shekhar, P.R. Schrater, R. R. Vatsavai, W. Wu, and S. Chawla, IEEE Transactions on Multimedia, 2001.

10.   The Quadtree and Related Hierarchical Data Structures. Finkel and Bentley, ACM Comput. Surv.1974

11.  An introductory tutorial on kd-trees, A. Moore

12.  Building of Trapezoidal Map from a set of non-intersecting lines, Jukka Kaartinen

13.  Spatial data structures for version management of engineering drawings in cad database. Y. Nakamura and H. Dekihara. In ICIAP b03: Proceedings of the 12th International Conference on Image Analysis and Processing, page 219, Washington, DC, USA, 2003. IEEE Computer Society.


3. Data Clustering Algorithms

1.             Data Clustering: A Review, A. K. Jain, M.N. Murthy and P.J. Flynn, ACM Computing Reviews, Nov 1999.

2.             On Line Clustering, Athman Bouguettaya, IEEE Transaction on Knowledge and Data Engineering Volume 8, No. 2, April 1996.

3.             Similarity Searching in Medical Image Databases, Euripides G.M. Petrakis and Christos Faloutsos, IEEE Transaction on Knowledge and Data Engineering Volume 9, No. 3, MAY/JUNE 1997.

4.             Windows NT Clusters for Availability and Scalability, Rob Short, Rod Gamache, John Vert and Mike Massa ,Microsoft Online Research Papers, Microsoft Corporation.

5.             Defining Data Mining, The Hows and Whys of Data Mining, and How It Differs From Other Analytical Techniques, Bruce Moxon, Online Addition of DBMS Data Warehouse Supplement, August 1996.

6.             An Efficient Approach to Clustering in Large Multimedia Databases with Noise. Hinneburg A., Keim D.A. Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, AAAI Press, 1998. http://citeseer.ist.psu.edu/hinneburg98efficient.html

7.             Data Clustering: Theory, Algorithms, and Applications, Guojun Gan , Chaoqun Ma , Jianhong Wu

8.             Chameleon: A hierarchical Clustering Algorithms Using Dynamic Modeling IEEE Computer, George Karypis, Eui-Hong Han, and Vipin Kumar, Special Issue on Data Analysis and Mining. Vol. 32, No. 8, August 1999.

9.           iVIRBRATE: Interactive Visualization Based Framework for Clustering Large Datasets, Keke Chen and Ling Liu. ACM Transactions on Information Systems.

10.         CURE: An efficient clustering algorithm for large databases, S. Guha, R. Rastogi, and K. Shim, In Proceedings of ACM SIGMOD International Conference on Management of Data, pages 73--84, New York, 1998.

11.         BIRCH: An Efficient Data Clustering Method for Very Large Databases, Tian Zhang, Raghu Ramakrishnan, and Miron Livny, In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pages 103--114, Montreal, Canada, 1996.

12.         Bipartite Graph Partitioning and Data Clustering. H. Zha and X. He and C. Ding and M. Gu and H. Simon. Proc. of {ACM} 10th Int'l Conf. Information and Knowledge Management, pp. 25--31, 2001. 

13.         Spectral biclustering of microarray data: coclustering genes and conditions. Y. Kluger and R. Basri and J.T. Chang and M. Gerstein. Genome Research. 13:703-716, 2003. 

14.         Automatic subspace clustering of high dimensional data for data mining applications. R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data, Seattle, Washington, June 1998

15.         A divisive information-theoretic feature clustering algorithm for text classification. I.S. Dhillon and S. Mallela and R. Kumar. JMLR, 3:1265-1287, 2003.

16.         Subspace clustering of high-dimensional binary data -- A probabilistic approach. A. Patrikainen and H. Mannila. Proc. Workshop on Clustering High Dimensional Data in {SIAM} International Conference on Data Mining, 2004.

17.         Segmentation using eigenvectors: a unifying view. Weiss Y. Proceedings IEEE International Conference on Computer Vision p. 975-982 (1999).

18.         Coupled two-way clustering analysis of gene microarray data. G. Getz and E. Levine and E. Domany. Proceedings of the National Academy of Sciences of the United States of America, 94:12079-12084, 2000.

19.         On clusterings - good, bad and spectral, S. Vempala R. Kannan and A. Vetta, in Proc. 41st Symposium on the Foundation of Computer Science, FOCS, 2000.

20.         Co-clustering documents and words using bipartite spectral graph partitioning. I.S. Dhillon. Knowledge Discovery and Data Mining, pp. 269--274, 2001.

21.         Iterative Double Clustering for Unsupervised and Semi-Supervised Learning, R. El-Yaniv and O. Souroujon.NIPS 14, pp. 1025-1032, 2002.

22.    Clustering by Passing Messages Between Data Points. Frey, B. J. & Dueck, D. Science, 2007, 315, 972-976
      


4. Stream databases


1.       Continuous Queries over Data Streams   John S. Breese, David Heckerman, and Carl Kadie, S. Babu and J. Widom.In SIGMOD Record, September 2001.

2.       Towards Sensor Database Systems. Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri. In Proceedings of the Second International Conference on Mobile Data Management. Hong Kong, January 2001. 

3.      Querying the Physical World. Philippe Bonnet, J. E. Gehrke, and Praveen Seshadri. IEEE Personal Communications, Vol. 7, No. 5, October 2000, pages 10-15. Special Issue on Smart Spaces and Environments.

4.       Fjording the Stream: An Architecture for Queries over Streaming Sensor Data, Sam Madden and Michael J. Franklin,ICDE Conference, February, 2002, San Jose.

5.     Streaming Queries over Streaming Data Sirish Chandrasekaran, Michael J. Franklin, VLDB Conference, August 2002, Hong Kong.

6.     Monitoring Streams: A New Class of Data Management Applications.D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik. In proceedings of the 28th International Conference on Very Large Data Bases (VLDB'02), August 20-23, Hong Kong, China.

7.      Gigascope: a stream database for network applications , Chuck Cranor, Theodore Johnson, and Oliver Spatscheck,in Proceedings of SIGMOD 2003.

8.     Query Processing, Approximation, and Resource Management in a Data Stream Management System. R. Motwani et al. CIDR, 2003.

9.     Aurora: A New Model and Architecture for Data Stream Management. D. Abadi, D. Camey, U. Cetintemel, M. Chemiack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. in VLDB Journal, 2003.

10.   Issues in Data Stream Management. Golab, L. und Ozsu, M. T. ACM SIGMOD Record. 32(2). 2003.

11. Estimating Clustering Indexes in Data Streams, Luciana Buriol, Gereon Frahling, Stefano Leonardi, Christian Sohler, Proc. 15th European Symposium on Algorithms (ESA), 2007

5. RFID data management

1.             Security and Privacy Issues in ePassport, Ari Juels, David Molnar and David Wagner, In Proceedings of Advances in Cryptology, 2005.

2.             Privacy and Security Issues in Library RFID Issues, Practices, and Architectures, David Molnar and David Wagner, In Proceedings of ACM CCS, 2004.

3.             High Power Proxies for Enhancing RFID Privacy and Utility, In Proceedings of PET, 2005.

4.             RFID Security and Privacy: A Research Survey, Ari Juels, In Proceedings of IEEE Journal on Selected Areas in Communication, 2006.

5.             A Platform for RFID Security and Privacy Administration. Melanie R. Rieback, Vrije Universiteit Amsterdam; Georgi N. Gaydadjiev, USENIX/SAGE Large Installation System Administration conference - LISA'06, December 2006

6.             RFID Privacy: An Overview of Problems and Proposed Solutions, IEEE Security and Privacy. v3 i3. 34-43, Pages: 897-914, 2007

7.             Protocols for RFID tag/reader authentication, Selwyn Piramuthu, Decision Support Systems, Volume 43, Issue 3, April 2007, Pages 897-914

8.      RFID privacy issues and technical challenges. Miyako Ohkubo, Koutarou Suzuki, Shingo Kinoshita. September 2005 Communications of the ACM, Volume 48 Issue 9

9.      Privacy for RFID through trusted computing. David Molnar, Andrea Soppera, David Wagner. November 2005 WPES '05: Proceedings of the 2005 ACM workshop on Privacy in the electronic society.

10.    RFID security and privacy: long-term research or short-term tinkering? Gene Tsudik, Mike Burmester, Ari Juels, Alfred Kobsa, David Molnar, Roberto Di Pietro, Melanie Rieback March 2008 WiSec '08: Proceedings of the first ACM conference on Wireless network security

11.   Mutual authentication in RFID: security and privacy. Radu-Ioan Paise, Serge Vaudenay. March 2008 ASIACCS '08: Proceedings of the 2008 ACM symposium on Information, computer and communications security

12.   Robust, anonymous RFID authentication with constant key-lookup. Mike Burmester, Breno de Medeiros, Rossana Motta. March 2008 ASIACCS '08: Proceedings of the 2008 ACM symposium on Information, computer and communications security

6. Web Search and Web IR

1. Bigtable: A Distributed Storage System for Structured Data, Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows,Tushar Chandra, Andrew Fikes, Robert E. Gruber, 7th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006

2. MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean, Sanjay Ghemawat, OSDI'04: Sixth Symposium on Operating System Design and Implementation, 2004

3. Clustering Billions of Images with Large Scale Nearest Neighbor Search, Ting Liu, Charles Rosenberg, Henry A. Rowley, IEEE Workshop on Applications of Computer Vision, 2007

4. Scaling Up All Pairs Similarity Search, Roberto Bayardo, Yiming Ma, Ramakrishnan Srikant, Proc. of the 16th Int'l Conf. on the World Wide Web, 2007

5. Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping, Mikhail Bilenko, Sugato Basu, Mehran Sahami, Proceedings of the 5th IEEE International Conference on Data Mining, 2005

6. Evaluating similarity measures: a large-scale study in the orkut social network, Ellen Spertus, Mehran Sahami, Orkut Buyukkokten, Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2005), 2005

7. Unweaving a web of documents, R. Guha, Ravi Kumar, D. Sivakumar, Ravi Sundaram, KDD, 2005

8. Mining Optimized Gain Rules for Numeric Attributes, Sergey Brin, Rajeev Rastogi, Kyuseok Shim, IEEE Trans. Knowl. Data Eng., 2003

9. Scalable Techniques for Mining Causal Structures, Craig Silverstein, Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, VLDB, 1998

10.  Query by Semantic Example, Nikhil Rasiwasia, Nuno Vasconcelos, Pedro J. Moreno, CIVR, 2006

11.  Indexing Dataspaces, Xin Dong, Alon Halevy, Proc. ACM SIGMOD, 2007

12.  Query Suspend and Resume, Badrish Chandramouli, Chris Bond, Shivnath Babu, Jun Yang, Proc. ACM SIGMOD, 2007

13.  Web-scale Data Integration: You can only afford to Pay As You Go, Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (Luna) Dong, David Ko, Cong Yu, Alon Halevy, Proceedings of the Conference on Innovative Data Systems Research (CIDR), 2007

14.  Data integration: the teenage years, Alon Halevy, Anand Rajaraman, Joann Ordille, Proc. 32nd International Conference on Very Large Databases, 2006

15.  Data management projects at Google, Wilson Hsieh, Jayant Madhavan, Rob Pike, SIGMOD Conference, 2006

16.  On-the-fly Sharing for Streamed Aggregation, Sailesh Krishnamurthy, Chung Wu, Michael J. Franklin, SIGMOD Conference, 2006

17.  Principles of dataspace systems, Alon Y. Halevy, Michael J. Franklin, David Maier, PODS, 2006

18.  Structured Data Meets the Web: A Few Observations, Jayant Madhavan, Alon Halevy, Shirley Cohen, Xin (Luna) Dong, Shawn R. Jeffery, David Ko, Cong Yu, Data Engineering Bulletin, 2006

19.  ULDBs: databases with uncertainty and lineage, Omar Benjelloun, Anish Das Sarma, Alon Halevy, Jennifer Widom, Proc. 32nd International Conference on Very Large Databases, 2006

20.  Web Search for a Planet: The Google Cluster Architecture, Luiz Andre Barroso, Jeffrey Dean, Urs Hlzle, IEEE Micro, 2003

21.  Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms, Monika Henzinger, Proc. SIGIR, 2006

22.  Indexing Shared Content in Information Retrieval Systems, Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Michael Herscovici, Ronny Lempel, John McPherson, Runping Qi, Eugene J. Shekita, EDBT, 2006

23.  Introduction to the special issue on XML retrieval, Ricardo Baeza-Yates, Norbert Fuhr, Yoelle Maarek, ACM Transactions on Information Systems, 2006

24.  Retroactive Answering of Search Queries, Beverly Yang, Glen Jeh, Proc. International World Wide Web Conference, 2006

25.  Semantic Search via XML Fragments: A High Precision Approach to IR, Jennifer Chu-Carroll, John Prager, Krzysztof Czuba, David Ferrucci, Pablo Duboue, Proc. 29th ACM SIGIR Conference on Research and Development in Information Retrieval, 2006

26.  Using annotations in enterprise search, Pavel A. Dmitriev, Nadav Eiron, Marcus Fontoura, Eugene Shekita, WWW, 2006

27.  Web mining with search engines: A web-based kernel function for measuring the similarity of short text snippets, Mehran Sahami, Timothy D. Heilman, Proc. 15th International World Wide Web Conference, 2006

28.  Concept-based interactive query expansion, Bruno M. Fonseca, Paulo Braz Golgher, Bruno Possas, Berthier A. Ribeiro-Neto, Nivio Ziviani, CIKM, 2005

29.  Information Discovery--Needles and Haystacks, Carl Lagoze, Amit Singhal, IEEE Internet Computing, 2005

30.  Algorithmic Aspects of Web Search Engines, Monika Rauch Henzinger, ESA, 2004

31.  eBizSearch: a niche search engine for e-business, C. Lee Giles, Yves Petinot, Pradeep B. Teregowda, Hui Han, Steve Lawrence, Arvind Rangaswamy, Nirmal Pal, SIGIR, 2003

32.  Semantic Associations for Contextual Advertising. Massimiliano Ciaramita and Vanessa Murdock and Vassilis Plachouras. Journal of Electronic Commerce Research Special Issue on Online Advertising and Sponsored Search.

33.  The Impact of Caching on Search Engines. Ricardo Baeza-Yates, Aristides Gionis, Flavio Junqueira, Vanessa Murdock, Vassilis Plachouras, Fabrizio Silvestri. 2007. 30th Annual International ACM SIGIR Conference.

34.  Tree revision learning for dependency parsing. G. Attardi and M. Ciaramita. 2007. In Proceedings of HLT-NAACL 2007.

35.  Know your Neighbors: Web Spam Detection using the Web Topology. Carlos Castillo and Debora Donato and Aristides Gionis and Vanessa Murdock and Fabrizio Silvestri. 2007. In Proceedings of SIGIR. ACM Press. (July 2007), Amsterdam, Netherlands, 423--430.

36.  James Caverlee, Steve Webb, and Ling Liu. "Spam-Resilient Web Rankings via Influence Throttling", Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS), Long Beach, 2007.

37.  The Self-Organized Web: The Yin to the Semantic Webbs Yang. Gary William Flake, David M. Pennock, and Daniel C. Fain. 2003. IEEE Intelligent Systems. 18, 4 75-77

38.  A content and structure website mining model. Barbara Poblete and Ricardo Baeza-Yates. 2006. In WWW '06: Proceedings of the 15th international conference on World Wide Web (Edinburgh, Scotland). ACM Press. New York, NY, USA, 957--958

39.  Relationship Between Web Links and Trade. Ricardo Baeza-Yates and Carlos Castillo. 2006. In WWW '06: Proceedings of the 15th international conference on World Wide Web (Edinburgh, Scotland). ACM Press. New York, NY, USA, 927--928.

40.  Communities from Seed Sets. Reid Andersen and Kevin J. Lang. 2006. In WWW '06: Proceedings of the 15th international conference on World Wide Web (Edinburgh, Scotland). ACM Press. New York, NY, USA, 223--232

41.  Generating Query Substitutions. Rosie Jones and Benjamin Rey and Omid Madani and Wiley Greiner. 2006. In WWW '06: Proceedings of the 15th international conference on World Wide Web. ACM Press. New York, NY, USA, 387--396.

42.  Multi-structural databases. Ronald Fagin, R. Guha, Ravi Kumar, Jasmine Novak, D. Sivakumar, Andrew Tomkins. 2005. In PODS '05: Proceedings of the twenty-forth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM Press. New York, NY, USA, 184--195.

43.  Unweaving a web of documents. R. Guha, Ravi Kumar, D. Sivakumar and Ravi Sundaram. 2005. In KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM Press. New York, NY, USA, 574--579.

44.  Variable latent semantic indexing. Anirban Dasgupta, Ravi Kumar, Prabhakar Raghavan and Andrew Tomkins. 2005. In KDD '05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM Press. New York, NY, USA, 13--21.

45.  Efficient implementation of large-scale multi-structural databases. Ronald Fagin, Phokion Kolaitis, Ravi Kumar, Jasmine Novak and Andrew Tomkins. 2005. o?=In VLDB '05: Proceedings of the 31st international conference on Very Large Data Bases. VLDB Endowment. 958--969.

46.  Discovering large dense subgraphs in massive graphs. David Gibson, Ravi Kumar and Andrew Tomkins. 2005. In VLDB '05: Proceedings of the 31st international conference on Very Large Data Bases. VLDB Endowment. 721--732.

47.  Query Incentive Networks. Jon M. Kleinberg and Prabhakar Raghavan. 2005 In FOCS '05: 46th Annual IEEE Symposium on Foundations of Computer Science. Pittsburgh, PA, 132--141 2.

48.  Indie: Distributed Indexing of autonomous Internet Services Peter Danzig, Shih-Hao Li, Katia Obraczka. Journam of Computer Systems, 5(4), 1992. Original description of Indie in 1991 ACM SIGIR

49.  Measuring Index Quality using Random Walks on the Web Monika Henzinger, Allan Heydon, Michael Mitzenmacher, and Marc A. Najork, Proceedings of the 8th International World Wide Web Conference, pages 213-225, May 1999

50.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery Soumen Chakrabarti, Martin van den Berg, Byron Dom, Proceedings of the 8thInternational World Wide Web Conference, May 1999

51.  Enhanced hypertext categorization using hyperlinks.S. Chakrabarti, B. Dom and P. Indyk. Proceedings of ACM SIGMOD 1998.

52.  Mining the link structure of the World Wide Web. S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. IEEE Computer.

53.  Trawling the Web for emerging cyber-communities. S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Eighth World Wide Web conference, Toronto, Canada, May 1999.

54.  Extracting large scale knowledge bases from the web. S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. IEEE International conference on Very Large Databases (VLDB), Edinburgh, Scotland.

55.  Clustering categorical data: an approach based on dynamical systems. D. Gibson, J. Kleinberg and P. Raghavan. Proceedings of the VLDB conference, 1998.

56.  The effectiveness of GlOSS for the Text Database Discovery Problem L. Gravano, H. Garcia-Molina, A. Tomasic. SIGMOD 1994. (GlOSS)


7. Data Mining

1.             Uncertain Data Mining: An Example in Clustering Location Data. Michael Chau, Reynold Cheng, Ben Kao, Jackey Ng. PAKDD 2006: 199-204.

2.             Trajectory pattern mining, Fosca Giannotti, Mirco Nanni, Fabio Pinelli, Dino Pedreschi, Proceedings of the 13th ACM SIGKDD international conferenc