PhD CS – Systems Body of Knowledge

This is the reading list for the OS qualifying exam. Note that the students appearing for the OS qualifying exam are also responsible for finding out and including in their preparation the curricular content of CS 6210.


OS Structures

  • S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kaashoek, et. al, “Corey: An Operating System for Many Cores”, OSDI 2008.
  • Baumann et. al, "The Multikernel: a new OS architecture for scalable multicore systems", 22nd Symposium on Operating Systems Principles, SOSP 2009.
  • OS Survey: Mukherjee, B., and Schwan, K., "Survey of Multiprocessor Operating Systems Kernels", Technical Report GIT-CC-92/05. College of Computing, Georgia Institute of Technology, November 1993. (available as schwan/6420/


Shared Memory Systems

  • Anderson, T.E., ``The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors", IEEE Transactions on Parallel and Distributed Systems, 1, 1, pgs. 6-16, January 1990.
  • Paul E. McKenney and John D. Slingwine. Read-Copy Update: Using Execution History to Solve Concurrency Problems, Parallel and Distributed Computing and Systems, Oct 1998.

Communication Mechanisms

  • Birrell and Nelson, "Implementing Remote Procedure Calls", ACM Transactions on Computer Systems, 2, 1, pgs. 39-59, February 1984.
  • Threads and communications: Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M., ``User Level Interprocess Communication for Shared Memory Multiprocessor", ACM Transactions on Computer Systems, 9, 2, pgs. 175-198, May 1991.
  • Clark, D.D., "The Structuring of Systems Using Upcalls", Proceedings of Tenth ACM Symposium on Operating Systems Principles, pgs. 171-180, Dec. 1985.
  • Draves, R.P., Bershad, B.N., Rashid, R.F. and Dean, R.W., ``Using Continuations to Implement Thread Management and Communication in Operating Systems", Proceedings of the Thirteenth ACM Symposium on Operating System Principles, pgs. 122-136, December 1991.
  • B. N. Bershad, T. E. Anderson, E. D. Lazowska, and H. M. Levy. Lightweight Remote Procedure Call . ACM Transactions on Computer Systems, 8(1):37--55, Feb. 1990.
  • Partial reading : Schroeder, M., and Burrows, M., " Performance of the Firefly RPC", Proceedings of the Twelfth ACM Symposium on Operating Systems Principles, pgs. 83-90, December 1989.

Distributed Systems Principles

  • M. Raynal and M. Singhal. Logical Time: A Way to Capture Causality in Distributed Systems. IRISA Technical Report.
  • M. Chandy and Leslie Lamport, "Distributed Snapshots: Determining Global States of Distributed Systems", ACM Transactions on Computer Systems, Feb. 1985.
  • Reinhard Schwarz and Friedmann Mattern, "Detecting causal relationships in distributed computations: in search of the holy grail", Distributed Computing, 1994.

Group Communication

  • Kenneth Birman, Andre' Schiper, and Pat Stephenson, "Lightweight Causal and Atomic Group Multicast", ACM Transactions on Computer Systems, Aug. 1991, 9, 3.
  • David Cheriton and Dale Skeen, Understanding the Limitations of Causally and Totally Ordered Communication, ACM SOSP, December 1993.  K. Birman et. al., Bimodal Multicast, ACM Transactions on Computer Systems 1999
  • Patrick Eugster et al. The Many Faces of Publish/Subscribe - ACM Computing Surveys 2003

Sharing in Distributed Systems

  • Kai Li and Paul Hudak, "Memory coherence in shared virtual memory systems", ACM TOCS, 7(4):321--359, November 1989.
  • P. Keleher, A. Cox, and W. Zwaenepoel. Lazy Release Consistency for Software Distributed Shared Memory, Proc. of the Twentieth Symposium on Computer Architecture, 1993.
  • Mustaque Ahamad, Gil Neiger, Prince Kohli, James Burns, and Phil Hutto, "Causal Memory: Definitions, Implementation and Programming", Distributed Computing, 9(1):37--49, Aug 1995.
  • Ahamad, M. and Kordale, R. Scalable Consistency Protocols for Distributed Services, IEEE Transaction on Parallel and Distributed Systems. 1999.
  • A useful overview appears in: Jelica Protic, Milo Tomazevic, Veljko Milutinovic, “Distributed Shared Memory: Concepts and Systems”, IEEE Parallel & Distributed Technology: Systems & Technology, June 1996.

File Systems and Distributed Shared Memory

  • Nelson, M.N., Welch, B.B., Ousterhout, J.K., ``Caching in the Sprite Network File System", ACM Transactions on Computer Systems, 6, 1, pgs. 134-154, Feb.1988.
  • J.J. Kistler and M. Satyanarayanan, "Disconnected Operation in the CODA File System", ACM Transactions on Computer Systems, Feb. 1992.
  • Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003, “The Google File System”, SOSP 2003. 

Failures, Consistency, and Recovery

  • J. N. Gray, P. McJones, M. W. Blasgen, R. A. Lorie, T. G. Price, G. R. Putzolu, and I. L. Traiger. " The Recovery Manager of a Data Management System ", ACM Computing Surveys, Vol. 13, No. 2, June 1981, pp. 223-242.
  • Walker et al., "The LOCUS Distributed Operating System," Proceedings of the Ninth ACM Symposium on Operating Systems Principles, pgs 49-70, December 1983.
  • D. Porter, O. Hofmann, C. Rossbach, A. Benn, E. Witchel, “Operating System Transactions“, SOSP’09.
  • D. Peng, F. Dabek, “Large-scale Incremental Processing Using Distributed Transactions and Notifications”, OSDI’10.
  • M. Castro, B. Liskov, Practical Byzantine Fault Tolerance, OSDI, Feb. 1999.
  • F. B. Schneider, Implementing Fault-tolerant Services Using the State Machine Approach: A Tutorial, Computing Surveys, 1990.
  • H. Yu and A. Vahdat. The Costs and Limits of Availability of Replicated Services , SOSP 2001.

Peer-to-Peer Systems and Middleware

  • David G. Andersen, Hari Balakrishnan, M. Frans Kaashoek, Robert Morris. Resilient Overlay Networks. Proc. 18th ACM SOSP, Banff, Canada, October 2001.
  • Stoica, I., Morris, R., Karger, D., Kaashoek, M. F. and Balakrishnan, H., “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications”, SIGCOMM 2001.
  • Kumar, V. Cooper, B. Cai, Z. Eisenhauer, G. Schwan, K, “Resource-Aware Distributed Stream Management Using Dynamic Overlays”, Proceedings of IEE International Conference on Distributed Computing Systems (ICDCS), June 2005.

Internet-Scale Computing and Data-Intensive Systems

  • Armando Fox, Steven Gribble, Yatin Chawathe, Eric Brewer, and Paul Gauthier, " Cluster-based Scalable Network Services ", Sixteenth ACM Symposium on Operating System Principles, Oct. 1997.
  • M. Zaharia, A. Konwinski, A. Joseph, R. Katz, I. Stoica, “Improving MapReduce Performance in Heterogeneous Environments”, OSDI’08.
  • Fay Chang, et. al. Bigtable: A Distributed Storage System for Structured Data, OSDI 2006.
  • Marcos K. Aguilera, et. al, “Sinfonia: A New Paradigm for Building Scalable Distributed Systems”, SOSP 2007. David Anderson et al. FAWN: A Fast Array of Wimpy Nodes, SOSP’09.
  • Balaji Palanisamy, Aameek Singh, Ling Liu, Bhushan Jain, “Purlieus: Locality-aware Resource Allocation for MapReduce in a Cloud”, ACM/IEEE International Conference on SuperComputing (SC2011), Seattle WA, Nov. 12-18, 2011.
  • “Ling Liu; Pu, C.; Wei Tang; “Continual Queries for Internet-Scale Information Delivery”, IEEE Transactions on Knowledge and Data Engineering, Aug. 1999.

Isolation, Safety, Protection

  • Cohen, E., and Jefferson, D., "Protection in the HYDRA Operating System ", Proceedings of Fifth ACM Symposium on Operating System Principles, pgs. 141- 160, 1975.
  • Saltzer, J.H., ``Protection and the Control of Information Sharing in Multics '', Communications of the ACM, 17, 7, 1974
  • R. Wahbe, S. Lucco, T.E. Anderson, S.L. Graham, "Efficient, Software-based Fault Isolation", Proceedings of the 14th ACM Symposium on Operating System Principles, Dec. 1993.
  • Whitaker, Shaw, Gribble, "Scale and Performance in the Denali Isolation Kernel", ACM OSDI 2002.
  • Chiueh, Venkitachalam, Pradhan, "Integrating Segmentation and Paging Protection for Safe, Efficient and Transparent Software Extensions", ACM Symposium on Operating System Principles, Dec. 1999.
  • Helen J. Wang, Xiaofeng Fan, Jon Howell, Collin Jackson, " Protection and Communication Abstractions for Web Browsers in MashupOS ", ACM Symposium on Operating System Principles, 2007.
  • S. Tang, H. Mai, S. King, “Trust and Protection in the Illinois Browser Operating System”, OSDI’10.

Real time Scheduling

  • Dertouzos, Michael L. and Mok, Aloysius Ka-Lau, "Multiprocessor On-Line Scheduling of Hard-Real-Time Tasks", IEEE Transactions On Software Engineering, Dec 1989.
  • Ramamritham, Krithi; Stankovic, John; Zhao, Wei, "Distributed Scheduling of Tasks with Deadlines and Resource Requirements", IEEE Transactions on Computers, vol.38, no.8, Aug.1989.
  • Mercer, Clifford W.; Savage, Stefan; and Tokuda, Hideyuki, "Processor Capacity Reservation for Multimedia Operating Systems", Tech. Report CMU-CS-93-157, Carnegie-Mellon Univ., May 1993, also in IEEE International Conference on Multimedia Computing and Systems, May 1994.
  • Carl A. Waldspurger and William E. Weihl, "Lottery Scheduling: Flexible Proportional-Share Resource Management", First Symposium on Operating Systems Design and Implementation (OSDI '94), Monterey, CA, November 1994.
  • Lu, C.; Stankovic, J.A.; Tao, G.; Son, S.H., "Design and evaluation of a feedback control EDF scheduling algorithm", Real-Time Systems Symposium, 1999. Proceedings. The 20th IEEE Real-time Systems Symposium, 1999.
  • Schwan, K; Zhou, H.,” Dynamic Scheduling of Hard Real-time Tasks and Real-time Threads”, IEEE Transactions on Software Engineering, Aug. 1992.
  • Steere, D. C. Goel, A. Gruenberg, J. McNamee, D. Pu, C. Walpole, J.; “A Feedback-Driven Proportion Allocator for Real-Rate Scheduling”, Third Symposium on Operating System Design and Implementation, Feb. 1999.

Multimedia and Operating Systems

  • D. James Gemmell, Harrick M. Vin, Dilip D. Kandlur, P. Venkat Rangan, and Lawrence A. Rowe, ``Multimedia Storage Servers: A Tutorial'', IEEE Computer, May 1995.
  • Shahabi, Zimmermann, Fu, and Yao. "Yima: A Second-Generation Continuous Media Server ", IEEE Computer Magazine, June 2002.
  • Ashvin Goel, Luca Abeni, Charles Krasic, Jim Snow, Jonathan Walpole, Supporting Time-Sensitive Applications on a Commodity OS, OSDI 2002.

Real-time, Quality of Service, Pervasive Systems

  • Tarek F. Abdelzaher and Kang G. Shin, " End-host Architecture for QoS-Adaptive Communication", IEEE Real-Time Technology and Applications Symposium, Denver, Colorado, June 1998.
  • Pillai, Shin, "Dynamic Voltage Scaling for Low-Power Embedded Operating Systems", ACM SOSP 2001. Ripal Nathuji and Karsten Schwan, " Virtual Power: Coordinated Power Management in Virtualized Enterprise Systems ", Symposium on Operating Systems Principles (SOSP), Oct. 2007.
  • John Heidemann, Fabio Silva, Chalermek Intanagonwiwat, Ramesh Govindan (USC/ISI), Deborah Estrin, Deepak Ganesan (UCLA). Building Efficient Wireless Sensor Networks with Low-Level Naming. Proceedings of SOSP 2001.
  • Angelo Corsaro and Douglas C. Schmidt. " Evaluating Real-Time Java Features and Performance for Real-time Embedded Systems", Proceedings of the 8th IEEE Real-Time Technology and Applications Symposium, San Jose, CA, September 2002.
  • Umakishore Ramachandran, Rajnish Kumar, Matthew Wolenetz, Brian Cooper, Bikash Agarwalla, JunSuk Shin, Phillip Hutto, and Arnab Paul. "Dynamic Data Fusion for Future Sensor Networks." ACM Transactions on Sensor Networks, August 2006.

Configurable Systems

  • C. Pu, T. Audrey, A. Black, C. Consel, C. Cowan, J. Inouye, L. Kethana, J. Walpole, and K. Zhang, "Optimistic Incremental Specialization: Streamlining a Commercial Operating Systems", Fifteenth ACM Symposium on Operating System Principles, ACM SIGOPS Notices, December 1995.
  • Craig A. N. Soules, Jonathan Appavoo, Kevin Hui, Robert W. Wisniewski, Dilma Da Silva, Gregory R. Ganger, Orran Krieger, Michael Stumm, Marc Auslander, Michal Ostrowski, Bryan Rosenburg, Jimi Xenidis: "System Support for Online Reconfiguration", Proceedings of Usenix 2003, pg 141-154.
  • Rosu, D; Schwan, K, Yalamanchili, S.; Jha, R, “On Adaptive Resource Allocation for Complex Real-time Applications”, IEEE Real-time Systems Symposium, 1997.
  • Dylan McNamee, Jonathan Walpole, Calton Pu, Crispin Cowan, Charles Krasic, Ashvin Goel, Perry Wagle, Charles Consel, Gilles Muller, Renauld Marlet, "Specialization Tools and Techniques for Systematic Optimization of System Software", ACM Transactions on Computer Systems, May 2001.


Communications & I/O

  • Maeda and B. Bershad, "Protocol Service Decomposition for High-Performance Networking", Proceedings of the 14th ACm Symposium on Operating System Principles, Dec. 1993.
  • Bhatti, Hiltunen, Schlichting, Chiu, "Coyote: A System for Constructing Fine- Grain Configurable Communication Services", ACM Transactions on Computer Systems, Nov. 1998.
  • David Kotz, "Disk-directed I/O for MIMD Multiprocessors", Symposium on Operating Systems Design and Implementation (OSDI '94), Nov. 1994, Usenix, ACM SIGOPS
  • V. Bala,, "CCL: A portable and tunable Collective Communication Library for scalable parallel computers", IEEE Transactions on Parallel and Distributed Systems, Vol 6, No 2, Feb 1995.
  • Cox, G., et al., "Interprocess Communication and Processor Dispatching on the Intel 432", Proceedings of the 8th Symposium on Operating System Principles, Asilomar, pgs. 44-53, Dec. 1981.

System Measurement and Modeling

  • Allen D. Malony and Daniel A. Reed, Harry A. G. Wijshoff, "Performance Measurement Intrusion and Perturbation Analysis", IEEE Transactions on Parallel and Distributed Systems, July 1992, 3, 4.
  • Weiming Gu, Greg Eisenhauer, Eileen Kraemer, Karsten Schwan, John Stasko, and Jeffrey Vetter, "Falcon: On-line Monitoring and Steering of Large-Scale Parallel Programs", Concurrency: Practice and Experience, 1998.
  • Ariel Tamches and Barton P. Miller, "Fine-Grained Dynamic Instrumentation of Commodity Operating System Kernels", Third Symposium on Operating Systems Design and Implementation (OSDI), New Orleans, February 1999.
  • J. Anderson, L. M. Berc, J. Dean, S. Ghemawat, M. R. Henzinger, S. Leung, R. L. Sites, M. T. Vandevoorde, C. A. Waldspurger, and W. E. Weihl "Continuous profiling: Where have all the cycles gone?." In Proc. of the 16th ACM Symposium of Operating Systems Principles (SOSP 97), pages 1-14, October 1997.
  • Shicong Meng, Srinivas Karshyap, Chitra Venketramani and Ling Liu, "REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems". Proceedings of IEEE Int. Conf. on Distributed Computing, ICDCS'09.