CS 7210 Distributed Computing 
Fall 2004

Instructor: Mustaque Ahamad (mustaq@cc.gatech.edu)
Office: CoC 257, Phone: 894-2593
Office Hrs: MW 11-12 Noon


Course Description

Distributed computing systems have become pervasive. From clusters to internetworked computers, distributed systems are being used to support a wide variety of applications. This course will focus on the fundamental concepts in distributed computing systems. The following are the objectives of this course:

We will achieve these goals by covering a set of papers in class that discuss the core topics. The course starts with basics that  include global states of distributed computations, logical and physical clocks, and various failure models. Distributed algorithms for consensus, replicated state management, and resource finding will also be covered. We will also address issues that arise in the context of wide area distributed systems. In addition to the papers covered in class, each student will select a set of recent papers in a certain area, and produce a short term paper describing the problems and results discussed in the papers. These summaries will shared with all students, and if necessary, they will be discussed in class. All students will be required to complete a significant programming project. There is considerable flexibility in defining such a project. Students will be able to formulate their own projects, and depending on the nature of the project, they can work in groups.

After successfully completing this course, it is expected that students will be ready to explore research problems in distributed systems and emerging applications that will be deployed in such systems.  This course is also quite intense  like other systems courses and class participation is a must.


Homework and Project Information

Homework will be assigned periodically. Although students can define their own projects, general guidelines will be made available here.


Homework 1 


Prerequisites

Advanced undergraduate or graduate course in operating systems or permission of instructor.


Topics

The following papers on the listed topics will be discussed in class. Other related papers will be assigned for reading. Homework will include material from papers assigned for reading.

Event Ordering, Global States and Time in Distributed Systems (8/16-8/30)

M. Raynal and M. Singhal. Logical Time: A Way to Capture Causality in Distributed Systems. IRISA Technical Report.

David Mills. Network Time Protocol, RFC 1305.

Chandy, M. and Lamport, L., Distributed Snapshots: Determining Global States of Distributed Systems, ACM Trans. on Computer Systems, February 1985.

Schwarz, R. and Mattern, F.,  Detecting Causal Relationships in Distributed Systems: In Search of the Holy Grain , Distributed Computing, 1994.

Term Paper Topics: Distributed Checkpointing/Recovery, Clock Synchronization Algorithms.

Failures and Distributed Systems (9/1-9/15)

Survey of failures in distributed systems (ch. 2, Mullender)

M. J. Fischer, N. Lynch and M. S. Patterson, Impossibility of distributed consensus with one faulty process, JACM 32, 1985.

Danny Dolev, Cynthia Dwork and Larry Stockmeyer, On the Minimal Synchronism Needed for Distributed Consensus, JACM, January 1987.

The weakest failure detector for solving consensus; Tushar Deepak Chandra, Vassos Hadzilacos and Sam Toueg; J. ACM 43, 4 (Jul. 1996), Pages 685 - 722

Term Paper Topics: Byzantine Failures, Costs of Consensus in Synchronous Systems, Probabilistic Consensus.

Abstractions for Supporting Distributed Applications I: Group Communication (9/17-9/29)

Ken Birman, Andre Schiper and Pat Stephenson, Lightweight Causal and Atomic Multicast, ACM TOCS, August 1991.

David Cheriton and Dale Skeen, Understanding the Limitations of Causally and Totally Ordered Communication, ACM SOSP, December 1993.

K. Birman et. al., Bimodal Multicast, ACM TOCS,

Term Paper Topics: Reliable Multicast Protocols, Virtual Synchrony, Group Communication Systems.

Abstractions for Supporting Distributed Applications II: Replicated Objects (10/1-10/15)

F. B. Schenider,   Implementing Fault-tolerant Services Using the State Machine Approach: A Tutorial ,  Computing Surveys, 1990.

Gifford, D., Weighted Voting for Replicated Data, ACM Symp. on Operating Systems Principles, December 1979.

Malkhi, D. and Reiter,  M.  Byzantine Quorum Systems , Journal of Distributed Computing, 1998.

Danco Davcev and W.A. Burkhard. Consistency and Recovery Control for Replicated Files. In Proc. Tenth ACM Symposium on Operating Systems, Operating Systems Review, 1985.

Mustaque Ahamad, Jim Burns, Phillip Hutto, Prince Kohli and Gil Neiger, Causal Memory, Distributed Computing, 1995.

Terry, D. B. et. al., Session guarantees for weakly consistent replicated data, 1994 PDIS.

Ahamad, M. and Kordale, R. Scalable Consistency Protocols for Distributed Services IEEE Transaction on Parallel and Distributed Systems. 1999.

F. Torres, M. Ahamad and M. Raynal,  Timed Consistency for Shared Distributed Objects , PODC 1999.

H. Yu and A. Vahdat.  The Costs and Limits of Availability of Replicated Services , SOSP 2001.

Term Paper Topics: Scalable Consistency Protocols, Update conflict detection/resolution.

Midterm Examination (10/15)

Naming, Resource Finding and Mobile  & Peer-to-Peer Systems (10/20-11/5)

Mullender, S., Vitany, P., Distributed Match-Making, Algorithmica, No.3, 1988.

Steen, M., Hauck, F., Homburg, P. and Tanenbaum, A. Locating objects in wide-area systems. IEEE Communications Magazine.

Stoica, I., Morris, R., Karger, D., Kaashoek, M. F. and Balakrishnan, H.,  Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications ,
TON.

Badrinath et. al., Designing Distributed Algorithms for Mobile Computing Networks. ICDCS.

M. Satyanarayanan, Fundamental Challenges in Mobile Computing, PODC 1995.

Term Paper Topics: Naming, Distributed Algorithms for Mobile Systems.

Security in Distributed Systems (11/8-11/19)

Lampson, B., Abadi, M. and Burrows, M,  Authentication in Distributed Systems: Theory and Practice , ACM Transactions on Computer Systems, 1992.

M. Kaminsky, G. Saviddes, D. Mazieres and M. F. Kaashoek,  Decentralized User Authentication in a Global File System ,  SOSP 2003.

Case Studies (11/22-12/3)

 Publish/subscribe systems (The IBM Gryphon System,   Georgia Tech Echo Event System)

Middleware systems (CORBA, .Net, Java RMI, Jini, JXTA)

P2P Systems  ( FarsiteSamsara, Eigentrust and Strategyproof Mechanisms )

Dynamic Content Consistency in the WWW

Term Paper Topics: Scalable Middleware, Wide area replication, Web caching.

Textbook and Other Readings

Journal and conference papers listed above will be made available.


Evaluation

Evaluation Item 

Credit 

Homeworks

10%

Examinations

60%

Term Paper

10%

Programming Project

20%

Class Participation

5%