Instructor: Mustaque Ahamad (mustaq@cc.gatech.edu)
Office: CoC 221
Phone: 894-2593
Office Hrs: MW 3:00-4:00pm
TA: Francisco Torres-Rojas
Office Hrs: TBA
As internetworked computing becomes pervasive, applications from many different domains are increasing being supported by distributed computing systems. The following are the primary goals of this course.
We will achieve these goals by covering a set of papers in class that discuss the core topics. These topics include global states of distributed computations, logical and physical clocks, and various failure models. Distributed algorithms for consensus, replicated state management, and resource finding will also be covered. In addition, each student will read a set of related papers and produce a brief summary of the problems and results described in the papers. These summaries will shared with all students and if necessary, they will be discussed in class. Finally, all students will be required to complete a programming project. There is considerable flexibility in deciding the nature of such a project and students will be able to define their own projects.
After successfully completing this course, it is expected that students
will be ready to explore research problems in distributed systems and many
other areas. The course is intense and class participation is a must.
Advanced undergraduate or graduate course in operating systems or permission of instructor.
The following papers on the listed topics will be discussed in class. Other related papers will be assigned for reading. Homeworks will include material from papers assigned for reading.
M. Raynal and M. Singhal. Logical Time: A Way to Capture Causality in Distributed Systems. IRISA Technical Report.
David Mills. Network Time Protocol, RFC 1305.
Chandy, M. and Lamport, L., Distributed Snapshots: Determining Global States of Distributed Systems, ACM Trans. on Computer Systems, February 1985.
Schwarz, R. and Mattern, F., Detecting Causal Relationships in Distributed Computations: In Search of the Holy Grail, Distributed Computing, 1994.
Term Paper Topics: Distributed Checkpointing/Recovery, Clock Synchronization Algorithms.
Survey of failures in distributed systems (ch. 2, Mullender)
M. J. Fischer, N. Lynch and M. S. Patterson, Impossibility of distributed consensus with one faulty process, JACM 32, 1985.
Danny Dolev, Cynthia Dwork and Larry Stockmeyer, On the Minimal Synchronism Needed for Distributed Consensus, JACM, January 1987.
Tushar Chandra and San Toueg. Unreliable Failure Detectors for Reliable Distributed Systems. JACM.
Term Paper Topics: Byzantine Failures, Costs of Consensus in Synchronous Systems.
Ken Birman, Andre Schiper and Pat Stephenson, Lightweight Causal and Atomic Multicast, ACM TOCS, August 1991.
David Cheriton and Dale Skeen, Understanding the Limitations of Causally and Totally Ordered Communication, ACM SOSP, December 1993.
K. Birman. A Response to Cheriton and Skeen's Criticism of Causally and Totally Ordered Communication. , Cornell University Technical Report TR93-1390.
Term Paper Topics: Reliable Multicast Protocols, Virtual Synchrony.
Gifford, D., Weighted Voting for Replicated Data, ACM Symp. on Operating Systems Principles, December 1979.
Danco Davcev and W.A. Burkhard. Consistency and Recovery Control for Replicated Files. In Proc. Tenth ACM Symposium on Operating Systems, Operating Systems Review, 1985.
Terry, D. B. et. al., Session guarantees for weakly consistent replicated data, 1994 PDIS.
Mustaque Ahamad, Jim Burns, Phillip Hutto, Prince Kohli and Gil Neiger, Causal Memory, Distributed Computing, 1995.
Term Paper Topics: Scalable Consistency Protocols, Update conflict detection/resolution.
Mullender, S., Vitany, P., Distributed Match-Making, Algorithmica, No.3, 1988.
Steen, M., Hauck, F., Homburg, P. and Tanenbaum, A. Locating objects in wide-area systems. IEEE Communications Magazine.
Badrinath et. al., Designing Distributed Algorithms for Mobile Computing Networks. ICDCS.
Term Paper Topics: Naming, Distributed Algorithms for Mobile Systems.
Vahdat, A. et. al. WebOS: Operating System Services for Wide Area Applications. Technical report UCB-CSD-97-938.
Draves et. al. Operating Systems Directions for the Next Millennium.
van Steen et. al. The GLOBE Project
Chengjie, L. and Cao, P. Maintaining Strong Consistency in the WWW ICDCS 1997.
Ahamad, M. and Kordale, R. Scalable Consistency Protocols for Distributed Services Georgia Tech Technical Report, 1997.
Term Paper Topics: CORBA/DCOM Middleware, Wide area replication, Web caching.
Lampson, B., Abadi, M., Burrows, M. and Wobber, E. Authentication in Distributed Systems: Theory and Practice DEC SRC Report 83, February 1992.
Belani, E., Vahdat, A., Anderson, T. and Dahlin, M. The CRISIS Wide-area Security Architecture 1998 Usenix Security Symposium.
Journal and conference papers. Reference books include Distributed Systems, edited by Sape Mullender, ACM Press.
Zachary Kurmas. A Survey of Tradeoffs Between Guarantees in Reliable Multicast
| Evaluation Item | Credit |
| Two Homeworks | 10% |
| Examinations | 45% |
| Term Paper | 15% |
| Programming Project | 15% |
| Class Participation | 10% |