CS 7210 Distributed Computing 
Fall 2002

Instructor: Mustaque Ahamad (mustaq@cc.gatech.edu)
Office: CoC 220
Phone: 894-2593
Office Hrs: TTh 1-2pm
TA: Ivan Tacic
TA Office Hrs:  12:30-2:00, Friday


Course Description

Distributed computing systems have become pervasive. From clusters to internetworked computers, we are using distributed systems to support a wide variety of applications. This course will focus on the fundamental concepts in distributed computing systems. The following are the objectives of this course: We will achieve these goals by covering a set of papers in class that discuss the core topics. The course starts with basics that  include global states of distributed computations, logical and physical clocks, and various failure models. Distributed algorithms for consensus, replicated state management, and resource finding will also be covered. We will also address issues that arise in the context of wide area distributed systems. In addition to the papers covered in class, each student will select a set of recent papers in a certain area, and produce a short term paper describing the problems and results discussed in the papers. These summaries will shared with all students, and if necessary, they will be discussed in class. Finally, all students will be required to complete a significant programming project. There is considerable flexibility in defining such a project, and students will be able to define their own projects, and depending on the nature of the project, students can work in groups.

After successfully completing this course, it is expected that students will be ready to explore research problems in distributed systems, and emerging applications that will be deployed in such systems. Similar to other systems courses, this course is also quite intense and class participation is a must.


Homework and Project Information

Homeworks will be assigned periodically. Although students can define their own projects, general guidelines will be made available here.

 Homework I -- Due Date: September 19, 2002

 Homework II, Due Date: December 3, 2002

Old Midterm Exam

 Old Final Exam


Prerequisites

Advanced undergraduate or graduate course in operating systems or permission of instructor.


Topics

The following papers on the listed topics will be discussed in class. Other related papers will be assigned for reading. Homeworks will include material from papers assigned for reading.

Event Ordering, Global States and Time in Distributed Systems (8/20-8/29)

M. Raynal and M. Singhal. Logical Time: A Way to Capture Causality in Distributed Systems. IRISA Technical Report.

David Mills. Network Time Protocol, RFC 1305.

Chandy, M. and Lamport, L., Distributed Snapshots: Determining Global States of Distributed Systems, ACM Trans. on Computer Systems, February 1985.

Schwarz, R. and Mattern, F.,  Detecting Causal Relationships in Distributed Systems: In Search of the Holy Grain , Distributed Computing, 1994.

Term Paper Topics: Distributed Checkpointing/Recovery, Clock Synchronization Algorithms.

Failures and Distributed Systems (9/3-9-12)

Survey of failure modes in distributed systems (ch. 2, Mullender)

M. J. Fischer, N. Lynch and M. S. Patterson, Impossibility of distributed consensus with one faulty process, JACM 32, 1985.

Danny Dolev, Cynthia Dwork and Larry Stockmeyer, On the Minimal Synchronism Needed for Distributed Consensus, JACM, January 1987.

The weakest failure detector for solving consensus; Tushar Deepak Chandra, Vassos Hadzilacos and Sam Toueg; J. ACM 43, 4 (Jul. 1996), Pages 685 - 722

Term Paper Topics: Byzantine Failures, Costs of Consensus in Synchronous Systems, Probabilistic Consensus.

Abstractions for Supporting Distributed Applications I: Group Communication (9/17-9/26)

Ken Birman, Andre Schiper and Pat Stephenson, Lightweight Causal and Atomic Multicast, ACM TOCS, August 1991.

David Cheriton and Dale Skeen, Understanding the Limitations of Causally and Totally Ordered Communication, ACM SOSP, December 1993.

K. Birman et. al., Bimodal Multicast, ACM TOCS,

Term Paper Topics: Reliable Multicast Protocols, Virtual Synchrony, Group Communication Systems.

Abstractions for Supporting Distributed Applications II: Replicated Objects (10/2-10/15)

F. B. Schenider,   Implementing Fault-tolerant Services Using the State Machine Approach: A Tutorial ,  Computing Surveys, 1990.

Gifford, D., Weighted Voting for Replicated Data, ACM Symp. on Operating Systems Principles, December 1979.

Danco Davcev and W.A. Burkhard. Consistency and Recovery Control for Replicated Files. In Proc. Tenth ACM Symposium on Operating Systems, Operating Systems Review, 1985.

Mustaque Ahamad, Jim Burns, Phillip Hutto, Prince Kohli and Gil Neiger, Causal Memory, Distributed Computing, 1995.

Terry, D. B. et. al., Session guarantees for weakly consistent replicated data, 1994 PDIS.

Ahamad, M. and Kordale, R. Scalable Consistency Protocols for Distributed Services IEEE Transaction on Parallel and Distributed Systems. 1999.

F. Torres, M. Ahamad and M. Raynal,  Timed Consistency for Shared Distributed Objects , PODC 1999.

H. Yu and A. Vahdat.  The Costs and Limits of Availability of Replicated Services , SOSP 2001.

Term Paper Topics: Scalable Consistency Protocols, Update conflict detection/resolution.

Midterm Examination (10/22)

Naming, Resource Finding and Mobile  & Peer-to-Peer Systems (10/22-10/31)

Mullender, S., Vitany, P., Distributed Match-Making, Algorithmica, No.3, 1988.

Steen, M., Hauck, F., Homburg, P. and Tanenbaum, A. Locating objects in wide-area systems. IEEE Communications Magazine.

Stoica, I., Morris, R., Karger, D., Kaashoek, M. F. and Balakrishnan, H.,  Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications ,
TON.

Badrinath et. al., Designing Distributed Algorithms for Mobile Computing Networks. ICDCS.

M. Satyanarayanan, Fundamental Challenges in Mobile Computing, PODC 1995.

Term Paper Topics: Naming, Distributed Algorithms for Mobile Systems.

Security in Distributed Systems (11/5-11/14)

Lampson, B., Abadi, M. and Burrows, M,  Authentication in Distributed Systems: Theory and Practice , ACM Transactions on Computer Systems, 1992.

J. Howell and D. Kotz,  End-to-end authorization ,  OSDI 2000.

Case Studies (11/19-12/5)

 The IBM Gryphon System

 Georgia Tech Echo Event System

 Transis

 Globe

CORBA, .Net, Java RMI and Jini

Pastry

P2P Distributed File Systems

Dynamic Content Consistency in the WWW
 

Term Paper Topics: Scalable Middleware, Wide area replication, Web caching.

Textbook and Other Readings

Journal and conference papers listed above will be made available.


Evaluation

Evaluation Item  Credit 
Two Homeworks 10%
Examinations 50%
Term Paper 10%
Programming Project 20%
Class Participation 10%