SYSTEMS QUALIFYING EXAMINATION
Fall 2003
Instructions: Please answer any 6 out of 9 questions.
Give detailed answers to questions mentioning all the assumptions you
are making and be comprehensive in your answers showing the details of
your approach and understanding of the problem.
1. Distributed Systems
Distributed systems are prone to failures. One of the critical
requirements for building a reliable distributed system is robustness
in the sense that the system is capable of tolerating and recovering
from failures. The big challenge has always been the tradeoff between
optimizing for performance and designing for robustness. Mechanisms
have been proposed ranging from low-level ones at the transport level
such as timeouts and virtual circuits, to high-level ones such as
replication of computation and state. With the growing interest in grid
and peer to peer computing applications, decentralized management of
computing tasks and applications become a popular alternative paradigm.
Imagine you are hired to join a team for building reliable distributed
systems for eCommerce applications. Your job is to develop scalable
mechanisms that can provide acceptable performance while providing
support for fault-tolerance and recovery. Please (1) list at least
three important issues that need to be considered at the system-level
and application level under client-server, grid, or peer-to-peer
infrastructure, (2) explore the space of possible solutions, and (3)
suggest a system architecture that embodies a solution approach, and
argue qualitatively why your architecture will prevail over other
approaches you have listed.
2. Specialization
Specialization of systems software can happen in several ways.
First, we can use partial evaluation (e.g., using the Tempo program
specializer) at compile time to generate specialized versions of a
program, when invariants become known at that time, and remain true
through all executions. Second, if invariants become known at
link time, we can use dynamic specialization techniques (e.g., template
filling and dynamic linking) to create specialized versions.
Third, if invariants become known at run-time, we can invoke a compiler
to compile and generate code for these invariants.
Use concrete scenarios (i.e., concrete system components and
invariants) to compare these three different approaches to create
specialized code in terms of the code generation overhead and quality
of code generated.
3. Scalability
In many papers in the systems area, the authors claim that they propose
a solution that scales well. A precise definition of scalability
depends on the context but it could be (1) as more resources are added
to the system, a proportionately larger number of requests can be
handled, or (2) as more resources are added to the system, the response
time for a fixed set of requests can be decreased. Consider the
following two scenarios: (1) a logically centralized service that
processes requests from a large number of clients, and (2) a scheme for
finding resources in a peer-to-peer system. Discuss how scalability
will be defined for these contexts. For each, suggest algorithms that
scale well and others that exhibit poor scalability. Can you suggest an
algorithm that may scale even better than the ones that exist in the
literature?
4. File Systems
A distributed system implements a core service which makes use of a
heartbeat like protocol to keep track of when nodes fail and recover.
If the failure or recovery of a node is detected, this information is
disseminated to all nodes with the property that near by nodes learn of
a node's failure more quickly than nodes that are far away. The system
also implements a clock synchronization protocol to bound drift across
node clocks.
You have been asked to implement a replicated file system in such a
system. The file system should strive to provide as strong consistency
as possible (readers must get the most recently written data).
The protocols employed by it should tolerate up to a threshold number
of benign failures. They should exploit the information that is
available from the failure detection service and can also use the
synchronized clocks when useful.
Give an outline of a protocol that can provide consistent access to
file data. Discuss how it does or does not benefit from the
services provided by the system.
5. Core OS Synchronization
There have been several proposals for higher-level
interprocess/interthread synchronization (such as monitors, path
expressions, and serializers).
(a) First, give the pros and cons of adopting such proposals in
operating systems from the implementation point of view.
(b) Second, discuss the relative merits of these constructs from the
point of view of an application programmer.
(c) Third, enumerate the mechanisms that are typically found in modern
operating
systems such as Sun Solaris and Microsoft NT for thread
synchronization. In particular discuss why such operating systems
chose to have or not to have such high-level synchronization constructs.
6. Communication Abstractions for Distributed Systems
Interprocess communication and synchronization is an age-old
topic. The Unix socket abstraction is a convenient way for
encapsulating the communication and synchronization needs of a
distributed application, and has been around for a long time.
Yet, we see a number of proposals for new abstractions that encapsulate
the synchronization and communication needs of parallel and distributed
applications.
(a) Why do we need new abstractions given the
ubiquity of sockets?
(b) Give concrete examples of application semantics
that illustrate the limitations of sockets, and warrant the need for
new abstractions (write code fragments to get your point across).
7. Virtualization
A newly rediscovered topic in operating systems is system
virtualization and isolation.
(a) Distinguish isolation from virtualization.
(b) Concerning isolation and virtualization
mechanisms, discuss how this
new work builds on previous research,
including on past hardware and
software for timeshared systems.
(c) Describe two NEW mechanisms you can think of that
would be useful in
implementing virtualization properties
for certain devices (you
Pick the specific devices),
(d) Provide examples as to the importance of this new
topic (at least
three).
8. Real-Time Scheduling
Define what optimality means for static vs. dynamic deadline-based
real-time scheduling, i.e., what is an optimal schedule in each case?
Discuss whether a reservation-based approach can be optimal in the
dynamic case, then describe an adaptive scheduling approach for
communications AND argue about how one might define optimality in this
case and reason about it.
9. O-O Systems
There is a long history of configurable systems in the object-oriented
domain.
(a) Why has the community focused on O-O systems for
operating system design? Come up with two reasons.
(b) What configuration opportunities exist in the O-O
domain for operating systems? Give two examples, one
each for dynamic vs. static configuration.
(c) Come up with an example of online configuration
that is suitable for distributed middleware, rather than operating
systems. Be technical here; don't just say 'security' or some such.