CS 6210
Advanced Operating Systems

Fall 2005

Course Description

CS 6210 (Advanced Operating Systems) is a graduate level course that covers in detail many advanced topics in operating system design and implementation.  It starts with topics such as operating systems structuring, multithreading and synchronization and then moves on to systems issues in parallel and distributed computing systems.  There is no textbook for this course. Rather, we will read and discuss a number of important research papers related to these topics. For each paper that is covered in class , students are expected to gain a solid understanding of the problem that is addressed by the paper, and the solution proposed by the authors. Papers listed under "Required Background" will be assigned for self study. Students must carefully read the self study papers because the understanding of their content may be essential for the papers that will be covered in class.  Papers listed under "Optional Readings" are for reference only. These papers will cover topics that extend or supplement the material in papers that are covered in class.  Students will be expected to have some understanding of the results in these papers but will not be tested on them.

Prerequisites

  • CS4210 or equivalent undergraduate OS course.  A good understanding of the concepts in a standard textbook such as Operating Systems Concepts , Silberschatz and Galvin (or its equivalent) will be assumed in CS 6210.
  • Good knowledge of UNIX and C programming.
  • CS4210's prerequisites or equivalents (such CS 2200).

Grading

5% class participation
40% projects (10% each)
25% midterm (Tuesday, October 11th)
30% final

Additional Material

Class Questionnaire

Projects

This course is project intensive and will have a sequence of four projects. Strong programming skills are absolutely essential for completing these projects.   All students must do the first two projects, but after that students can choose to define a project that fits more closely with their individual research goals to replace Projects 3 and 4.  For more information on the special projects, stay tuned.

Each student in the class will get a directory in [/net/hc280/class/cs6210] and have access to Clusters and EDHPC machines. You will use this directory to turn in your projects.  If you have no CoC account, please apply soon so you can have the class directory and access to development resources.

Project Submission instructions Please read this and follow these instructions for all projects

Project 1 Due Monday, September 12.

Project 2 Due Wednesday, October 5th.

Project 3 Due Wednesday, November 9th.

Project 4 Due Tuesday, December 6th.

Sample Special Projects

Participation

Your class participation grade is composed of four main parts:

  • Regular attendance to the class lectures.
  • An index card with your name, picture and something unique about yourself (worth 1%; due Sept. 1st)
  • A summary of one of the papers on the reading list (approximately one page; sign up on the swiki). Paper summaries are due by the end of the section in which the paper appears. In other words, if you are covering an "OS Structures" paper, you should post your summary before the first lecture on "Communication in Shared Memory Multiprocessor Systems."
  • Notes for one class period (sign up on the swiki). These notes are due a week from the day of the lecture.

Link to the sign-up swiki: cs6210 swiki

Syllabus

Optional supplementary reference texts include the following:

  • Operating Systems Concepts , Silberschatz and Galvin.
  • OS: Advanced Concepts, Maekawa, Oldehoeft. Addison-Wesley.
  • Distributed Systems, Sape Mullender, Addison-Wesley.
  • Distributed Operating Systems , Andrew S. Tanenbaum, Prentice Hall.
  • An Introduction to Programming with Threads , Andrew Burrell.
  • Multithreaded Programming with Pthreads , Chapter 4, Bil Lewis, Daniel J. Berg.

Basics

  1. Course overview and assumptions, which include basics of operating system structure, micro-kernels, user- and kernel-level threads, synchronization, deadlock detection and avoidance. Refer to Operating System Concepts, Silberschatz and Galvin, and Multithreaded Programming with Pthreads, Chapter 4.

Note: the following paper list is tentative and subject to change in the next week or so. Please consider this before printing a large number of papers, especially those from later in the schedule.


OS Structures (4 lectures)

  1. Brian Bershad et al., " Extensibility, Safety and Performance in the SPIN Operating System ", Proceedings of the 15th ACM Symposium on Operating System Principles, December 1995.
  2. Dawson R. Engler, Frans Kaashoek and James O'Toole, "Exokernel: An Operating System Architecture for Application-Level Resource Management ", Proceedings of the 15th ACM Symposium on Operating System Principles, ACM, December 1995.
  3. J. Liedtke, " On Micro-Kernel Construction ", Proceedings of the 15th ACM Symposium on Operating System Principles, ACM, December 1995.
  4. Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield, "Xen and the Art of Virtualization ", SOSP 2003.


    Review Questions: link

Synchronization, Communication, and Scheduling in Parallel Systems (4 lectures)

  1. Mellor-Crummey, J. M. and Scott, M., "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors ", ACM Transactions on Computer Systems, Feb. 1991.
  2. B. N. Bershad, T. E. Anderson, E. D. Lazowska, and H. M. Levy. Lightweight Remote Procedure Call . ACM Transactions on Computer Systems, 8(1):37--55, Feb. 1990.
  3. M.S. Squillante and E.D. Lazowska, " Using Processor-Cache Affinity Information in Shared Memory Multiprocessor Scheduling ", IEEE Transactions on Parallel and Distributed Systems, Feb. 1993, pgs. 131-143.
  4. Ben Gamsa, Orran Krieger, Jonathan Appavoo, and Michael Stumm, Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System , 1999 Symposium on Operating System  Design and Implementation. (Kishore's notes on Tornado )


    Review Questions: link

Communication Mechanisms in Distributed Systems (4 lectures)

  1. Lamport, L., " Time, Clocks, and the Ordering of Events in a Distributed System ", Communications of the ACM, 21, 7, pgs. 558-565, July 1978.
  2. C.A. Thekkath and H.M. Levy, " Limits to Low-Latency Communications on High-Speed Networks ", ACM Transactions on Computer Systems, May 1993.
  3. David Wetherall, " Active Networks: Vision and Reality: Lessons from a Capsule-based System ", 17th ACM Symposium on Operating System Principles, OS Review, Volume 33, Number 5, Dec. 1999. ( PPT Spring 04 )
  4. Liu, Kreitz, van Renesse, Hickey, Hayden, Birman, Constable, "Building Reliable High Performance Communication Systems from Components ", 17th ACM Symposium on Operating System Principles, OS Review, Volume 33, Number 5, Dec. 1999.


    Review Questions: link

Midterm Exam

The midterm is Tuesday, October 11th.

Here are some example midterms:

Note: the course content has changed somewhat since these midterms were written. Use them as a guide for the type and scope of questions, but don't worry if you are not familiar with certain papers that are no longer covered.

Distributed Objects and Middleware (4 lectures)

  1. Mitchell, J. G., et al., " An Overview of the Spring System ", Proceedings of Compcon, Feb. 1994.
  2. Hamilton, G., Powell, M.L., and Mitchell, J.J., "Subcontract: A Flexible Base for Distributed Programming ", Proceedings of the Fourteenth ACM SOSP, pgs. 69-79, December 1993. ( PPT Fall03 )
  3. Wollrath, A., Riggs, R., and Waldo, J., "A Distributed Object Model for the Java System ", Usenix Conference on Object Oriented Technologies and Systems, May 1996. (slides from Ada Gavrilovska's presentation )
  4. Emmanuel Cecchet, Julie Marguerite, Willy Zwaenepoel, "Performance and Scalability of EJB Applications", Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications.


    Review Questions: link

Distributed Shared Memory and File Systems (3 lectures)

  1. Feeley, Morgan, Pighin, Karlin, Levy, Thekkath,, "Implementing Global Memory Management in a Workstation Cluster ", Fifteenth ACM Symposium on Operating System Principles, Dec. 1995.
  2. C. Amza, A. Cox, S Dwarkadas, P Keleher, H Lu, R. Rajamony, W. Yu and W. Zwaenepoel, " TreadMarks: Shared Memory Computing on Networks of Workstations " IEEE Computer, February, 1996.
  3. Anderson, T. et all., " Serverless Network File System ", ACM Transpaction on Computer Systems, February 1996.


    Review Questions: link

Multimedia (2 lectures)

  1. Bolosky, Fitzgerald, and Douceur. " Distributed Schedule Management in the Tiger Video Fileserver ", In Proceedings of the 16th ACM Symposium on Operating Systems Principles, Oct. 1997.
  2. Michael B. Jones, Daniela Rosu and Marcel Rosu, "CPU Reservations and Time Constraints: Efficient, Predictable Scheduling of Independent Activities ", Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP '97), St. Malo, France, Oct., 1997.


    Review Questions: link

Failures, Consistency, and Recovery (4 lectures)

  1. Satyanarayanan, M., et al., " Lightweight Recoverable Virtual Memory ", The Proceedings of Fourteenth ACM Symposium on Operating System Principles, pgs. 146-160, December 1993.
  2. David E. Lowell and Peter M. Chen, " Free Transactions With Rio Vista ", Proceedings of the Sixteenth ACM Symposium on Operating System Principles, October 1997.
  3. R. Haskin et. al., " Recovery Management in QuickSilver ", ACM Transactions on Computer Systems, February 1988.
  4. J. N. Gray, P. McJones, M. W. Blasgen, R. A. Lorie, T. G. Price, G. R. Putzolu, and I. L. Traiger. " The Recovery Manager of a Data Management System ", ACM Computing Surveys, Vol. 13, No. 2, June 1981, pp. 223-242. ( slides from Gregory Eisenhauer's presentation )


    Review Questions: link

System Support for Internet Scale Computing (2 lectures)

  1. Web Technologies (2 short papers)
    1. Curbera, F., Duftler, M., Khalaf, R., Nagy, W., Mukhi, N., Weerawarana, S., " Unraveling the Web services web: an introduction to SOAP, WSDL, and UDDI ", IEEE Internet Computing, Volume: 6 Issue: 2, March-April 2002, pgs. 86 -93.
    2. Curbera, F., Khalaf, R., Mukhi, N., Tai, S., Weerawarana, S., " The Next Step in Web Services ", Communications of the ACM, Volume 46 Issue 10 ,October 2003, pgs. 29-34.
  2. Content distribution networks (1 paper)
    1. Freedman, M., Freudenthal, E., Mazières, D., " Democratizing content publication with Coral ", In Proceedings of the 1st Symposium on Networked Systems Design and Implementation (NSDI 2004), San Francisco, CA, March 2004, pgs. 239-252.


    Review Questions: link

Security (2 lectures)

  1. Saltzer, J.H. and Schroeder, M.D., " Protection and the Control of Information in Computer Systems ", Proceedings of the IEEE, 63(9):1278-1308, Sept. 1975.
  2. M. Satyanarayanan, " Integrating Security in Large Scale Distributed Systems ", ACM TOCS, Aug. 1989.


    Review Questions: link

Final Exam

Example finals:



Background Papers

Required Background:

    1. Anderson, T.E., " The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors ", IEEE Transactions on Parallel and Distributed Systems, 1, 1, pgs. 6-16, January 1990.
    2. Birrell and Nelson, " Implementing Remote Procedure Calls ", ACM Transactions on Computer Systems, 2, 1, pgs. 39-59, February 1984. Also refer to Operating System Concepts , Silberschatz and Galvin.
    3. Basics on message passing and communication protocols. Refer to Operating System Concepts, Silberschatz and Galvin. Also refer to the web pages of the CoC networking courses.
    4. SUN NFS, Locus, and Sprite - from Operating System Concepts , Silberschatz and Galvin.
    5. Russel Sandberg et al., "Design and Implementation of the Sun Network Filesystem ", In Proceedings of Summer 1985 USENIX Conf., pgs. 119-130, 1985.
    6. Nelson, M.N., Welch, B.B., Ousterhout, J.K., "Caching in the Sprite Network File System ", ACM Transactions on Computer Systems, 6, 1, pgs. 134-154, February 1988.
    7. Walker et al., "The LOCUS Distributed Operating System ", Procedings of the Ninth ACM Symposium on Operaitng Systems Principles, pgs 49-70, December 1983

Optional Background/Supplemental