Wednesday, September 29th. (IN CLASS)
Distributed Systems: Concepts
-
Ricart, G. and Agrawala, A.K., "An Optimal Algorithm for Mutual Exclusion
in Computer Networks", Communication of the ACM, 24, 1, pgs. 9-17, January
1981.
-
Lamport, L., "Time, Clocks, and the Ordering of Events in a Distributed
System", Communications of the ACM, 21, 7, pgs. 558-565, July 1978.
-
David K. Gifford, "Weighted Voting for Replicated Data," Proceedings of
the Seventh ACM Symposium on Operating System Principles, Asilomar, Dec. 1979.
Distributed Systems: Failures, Consistency and Recovery
-
The SUN NFS, Locus, and Sprite - Silberschatz/Galvin, "Operating System Concepts". (self study).
-
Nelson, M.N., Wlech, B.B., Ousterhout, J.K., "Caching in the Sprite Network File System", ACM Transactions on Computer Systems, 6, 1, pgs. 134-154, February 1988.
-
Davcec and Burkhard, "Consistency and Recovery Control of Replicated Files", ACM Symposium on Operating System Principles, Dec. 1985.
-
Anderson, T. etc. all., "Serverless Network File System", ACM Transpaction on Computer Systems, February 1996.
-
Karlin, A.R., Levy, H.M., and Thekkath, "Implementing Global Memory Management
in a Workstation Cluster", Fifteenth ACM Symposium on Operating System
Principles, Dec. 1995.
%DSM:
- C. Amza, A. Cox, S Dwarkadas, P Keleher, H Lu, R. Rajamony, W. Yu
and W. Zwaenepoel, ``TreadMarks: Shared Memory Computing on Networks of
Workstations,'' IEEE Computer, February, 1996.
Multimedia, Real-Time, and Web Services
-
D. James Gemmell, Harrick M. Vin, Dilip D. Kandlur, P. Venkat Rangan, and
Lawrence A. Rowe, "Multimedia Storage Servers: A Tutorial", IEEE Computer,
May 1995.
-
Henry Massalin and Calton Pu, "Threads and Input/Output in the Synthesis Kernel", ACM 12th Symposium on Operating Systems Principles, Dec. 1989.
-
Rangan, P.V. and Vin, H.M., "Designing File Systems for Digital Video and
Audio", Proceedings of the Thirteenth ACM Symposium on Operating System
Principles, pgs. 81-94, December 1991 (self-study).
-
Armando Fox, Steven Gribble, Yatin Chawathe, Eric Brewer, and Paul Gauthier,
"Cluster-based Scalable Network Services", Sixteenth ACM Symposium on
Operating System Principles, Oct. 1997.
- Erik Riedel, Garth Gibson, Christos Faloutsos, "Active Storage For Large-Scale Data
Mining and Multimedia,"
Proc. of the 24th International Conference on Very large Databases (VLDB '98),
New York, New York, August 24-27, 1998.
- Clark and Zhang, "Supporting Real-time Applications in an Integrated
Services Packet Network: Architecture and Mechanism", ACM SIGCOMM, 1992.
-
M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger and Deborah A. Wallach, "Server Operating Systems," 7th SIGOPS European workshop: Systems suppport for worldwide
applications, Connemara, Ireland, September 1996.
Distributed Systems: Failures, Consistency and Recovery
-
Walker et all., "The LOCUS Distributed Operating System," Procedings of the Ninth ACM Symposium on OPeraitng Systems Principles, pgs 49-70, December 1983 (self study).
-
R. Haskin et. al., "Recovery Management in QuickSilver", ACM Transactions
on Computer Systems, February 1988.
-
Satyanarayanan, M., et al., "Lightweight Recoverable Virtual Memory", The
Proceedings of Fourteenth ACM Symposium on Operating System Principles,
pgs. 146-160, December 1993.
-
David E. Lowell and Peter M. Chen, "Free Transactions
With Rio Vista", Proceedings of the Sixteenth ACM Symposium on Operating
System Principles, October 1997.
Protection, Object-based Systems and Object Technologies
-
Linden, T.A., "Operating System Structures to Support Security and Reliable
Software", Computer Surveys, 8, 4, pgs. 409-445, 1976. Also see chapter
on protection in Silberschatz/Galvin, Operating System Concepts.
(Reference only).
-
Cohen, E., and Jefferson, D., "Protection in the HYDRA Operating System",
Proceedings of Fifth ACM Symposium on Operating System Principles, pgs.
141-160, 1975.
-
Mitchell, J. G., et al., "An
Overview of the Spring System".
-
Hamilton, G., Powell, M.L., and Mitchell, J.J., "Subcontract: A Flexible
Base for Distributed Programming", Proceedings of the Fourteenth ACM SOSP,
pgs. 69-79, December 1993.
-
Birrell, A., Nelson, G., Owicki, S., and Wobber, E., "Network
Objects", Digital, SRC Research Report No. 115, Dec. 1995.
-
Wollrath, A., Riggs, R., and Waldo, J., "A Distributed Object Model for
the Java System", Usenix Conference on Object Oriented Technologies and
Systems, May 1996.
- Aldrich, Dooley, et al., "Providing Easier Access to Remote Objects
in Client-Server Systems," 31th Hawaii International Conference on System
Sciences in January, 1998.
Advanced Topics in Object Systems: Representations
- Christian Clemencon, Karsten Schwan, and Bodhi Mukherjee, "Distributed
Shared Abstractions (DSA) on Large-Scale Multiprocessors", IEEE
Transactions on Software Engineering, February 1996.
- M. Ahamad and R. Kordale, "Scalable Consistency Protocols for
Distributed Services," IEEE Transactions on Parallel and Distributed
Systems, 1999.
- Ahmed Gheith and Karsten Schwan, "CHAOS-Arc -- Kernel Support for
Atomic Transactions in Real-Time Applications", ACM Transactions on Computer
Systems, April 1993.
More on Object Technologies, Review, and Overflow
Other Information
-
Text: Papers available online (via the links above) or from instructor.
-
Supplementary Materials: Operating System Textbook used in GT OS undergraduate courses: Operating System Concepts, Silberschatz and Galvin; Advanced Operating
Systems text: OS: Advanced Concepts, Maekawa, Oldehoeft. Addison-Wesley. "Distributed Systems", Sape Mullender, Addison-Wesley. "Distributed Operating Systems", Andrew S. Tanenbaum, Prentice Hall.
-
Prerequisites: CS 3431/4431 and its prerequisites or equivalent.
-
Homeworks and projects will be posted in the newsgroup git.cc.class.6420.
Information related to the course that is of general interest can also
be posted in this newsgroup.
-
Graduating students will be given the final examination in dead week. Consult
the instructor for details.
Grading
40% project/homework (7% for project 1, 11% each for projects
2, 3, and 4)
25% midterm
30% final
5% class participation
Instructions for Special Projects
To propose a special project for this class, I would like from you the
following materials:
Brief Project Description
-
at least three different intermediate project steps, with delivery iitems
and deadlines for each
-
final project deadline sometime during the week before finals
The first step is often background work, such as producing a bibliography
of relevant papers and having read them and having designed suitable algorithms/approaches.
The second step, typically around midterm time, is having produced much
of the software necessary, and having debugged it.
The third step must include not only software testing but also
performance evaluation, on whatever platform you choose to use.
The final deliverable not only includes the actual software but also
a report, which is outlined next.
Final Report
You are asked to submit an on-line final report regarding your special
project that consists of the following parts:
-
A statement of your approach to the project and the technique used to solve
it - two typewritten pages minimum, 8 pages maximum, including a list of
references to related work.
-
If applicable, a running program with sufficient documentation so that
someone can understand your program without re-running it. Such documentation
should consist of detailed comments within the program text and of
explanations on a separate piece of paper. You should be prepared to hand
in your program electronically, if requested. Most likely, you will simply
schedule a demo with me. (please do so!)
-
A conclusion, stating the main results of your work. This conclusion might
contain a performance evaluation of your program or a list of next st.ps
concerning it (what might be interesting to do next). How it should be
extended, what should be done to make it more useful. Maximum 4 pages,
minimum 2 pages.
-
A one page evaluation of what you did: its usefulness in the context of
other work AND in the context of general research (namely, why did you
do this and why was or wasn't it worthwhile doing?)
-
A one page discussion relating the work you did to the topics we studied
in class. Comment on what papers in class relate to what you did
or to extensions of what you did, if applicable.
Pre-packaged Special Projects
Below are some details on possible 6210 'Special Projects'. I can supply more
information if any of you are interested in any of these topics. In each case,
you will be working with some of my other PhD and MSc students and research
scientists. The particular topic areas I am listing below are related to
various research efforts undertaken by my group. I am also willing to entertain
proposals from you that link your 6210 project to research you might be doing
with other faculty, as long as the contents of your proposed projects address
some significant portion of the technical matters taught in this course.
For those who want to do a special project...I have given you indications as
to projects I would find interesting...what I have given you is choices
concerning areas in which you can do these projects. If you want to proceed,
here is what I would like you to do:
- Send me email with whatever interests/vague ideas you have (schwan@cc)
- Let me look at it, then I can comment via email and/or we can meet to
discuss it
- we work out a plan for the special project that covers the rest
of the semester, typically consisting of several intermediate
steps.
It is my intent to have the special project you do, if you do it, replace the
rest of ALL OF THE regular class projects . This way, you have time to
do something interesting or relevant.
- Active System Area Networks (ASAN)
This new project (funded by `just awarded' multi-year grants from NSF and
DOE) is developing a software AND hardware architecture for scalable
cluster computers. The large-scale cluster machines available to this
work include both SMP-Intel-based nodes and SUN Sparc-based cluster machines,
running the Solaris and Linux operating systems. In conjunction with
collaborators in ECE, we are using and then enhancing communication
coprocessors for such cluster nodes and also developing
a software architecture as well as application-level code for these
coprocessors, to enhance the scalability of the resulting cluster computing
engines. Specific topics include:
-
Virtual Communication Machines - an extensible software architecture
for Active System Area Networks (see Marcel Rosu's (Ph.D. student) web
page for information on a previous, ATM-based prototype of the VCM).
Build elements of a VCM on either Myrinet boards or on Intel's I2O-based
communication co-processors.
-
Scalable Cluster Applications - develop web-based and media-based
applications that exhibit improved scalability with the ASAN architecture
-
Quality Management - online quality management for cluster applications
by development of innovative online resource management algorithms
and methods (see Richard West's web page for information on packet
scheduling methods, for instance. also see Daniela Rosu's web page for
information on quality management algorithms and infrastructures at the
application level).
- The Adaptive Systems Project
More information can be found at the
Adaptive
Systems Home page and the
Critical Systems Lab Home Page
This project has been ongoing for some number of years, and its results
have included the development of methods and algorithms for online
adaptation of real-time systems and applications, the development of
software infrastructures (at the kernel level and in middleware - CORBA
and event-based, and using Linux at the kernel level) for online quality
management, including online monitoring and program adaptation. NEW
DIRECTIONS in this project for which we are actively seeking students
and supported by recent, new funding from DARPA include:
- Agile Quality Management Infrastructures
help develop a software infrastructure that is capable of being
instantiated at runtime in multiple ways; as middleware and with each
application, in the OS kernel, or in the network infrastructure (ie., on
communication coprocessors). Each instance of such an infrastructure is
capable of managing the resources and resource usage of adaptable
applications in just the way required by those applications.
-
Wireless and mobile systems
work on applications and/or communication protocols and software that
extends applications across wireless systems, with adaptations
applied at the application, kernel, and network levels. Work with
wireless communications, with media applications like a distributed
version of the DOOM game being enhanced by our group, with media
streaming applications (e.g., next generation virtual environments).
This effort is funded in part by the new Yamacraw project at Georgia Tech.
-
Internet-based quality management, information flows
work on Internet applications for which quality management must be
performed for the flows of information between servers and clients. This
effort links with the database-directed efforts of our new CoC faculty
Calton Pu and Ling Liu.
- Computational and Information Spaces
More information can be found at the
Distriubted Labs
Project Home Page and at the
IHPCL Home Page
This project, funded by the NSF, in partnership with the NCSA Alliance,
and by large-scale industry funds (Intel Corp.) is developing middleware
for high performance, parallel, and distributed applications. In contrast
to previous work in HPC, our focus is on interactive high performance
codes. Sample efforts for which students are needed include:
-
Develop distributed CORBA-compliant and/or JAVA-based software
infrastructures with which high performance, interactive applications
are easily assembled, employed, and controlled. (see web pages of
Greg Eisenhauer and Dong Zhou for additional detail).
-
Build 'parallel workbenches' for applications like global atmospheric
modeling, mechanical systems design, parallel optimization for applications
like airline crew scheduling or others (based on students' interests). Or
work on distributed object and event representations suitable for high
performance usage.
-
Create sample applications requiring support like this, including working
with the distributed version of DOOM and other such highly interactive
high performance codes, including web-based applications.
-
Develop communication-level support for high performance computing, across
SAN, LAN, and wireless media.
-
Develop compiler support for interactive high performance computing,
including working with IDL compilers, with Java compilers/runtimes.
Particular emphasis on development of online optimization methods for
high performance interactive applications and systems, including online
binary code generation.