SYSTEMS QUALIFYING EXAMINATION
Fall 2003

Instructions:  Please answer any 6 out of 9 questions.

Give detailed answers to questions mentioning all the assumptions you are making and be comprehensive in your answers showing the details of your approach and understanding of the problem.
1.    Distributed Systems
Distributed systems are prone to failures. One of the critical requirements for building a reliable distributed system is robustness in the sense that the system is capable of tolerating and recovering from failures. The big challenge has always been the tradeoff between optimizing for performance and designing for robustness. Mechanisms have been proposed ranging from low-level ones at the transport level such as timeouts and virtual circuits, to high-level ones such as replication of computation and state. With the growing interest in grid and peer to peer computing applications, decentralized management of computing tasks and applications become a popular alternative paradigm.
Imagine you are hired to join a team for building reliable distributed systems for eCommerce applications. Your job is to develop scalable mechanisms that can provide acceptable performance while providing support for fault-tolerance and recovery. Please (1) list at least three important issues that need to be considered at the system-level and application level under client-server, grid, or peer-to-peer infrastructure, (2) explore the space of possible solutions, and (3) suggest a system architecture that embodies a solution approach, and argue qualitatively why your architecture will prevail over other approaches you have listed.
2.    Specialization

Specialization of systems software can happen in several ways.  First, we can use partial evaluation (e.g., using the Tempo program specializer) at compile time to generate specialized versions of a program, when invariants become known at that time, and remain true through all executions.  Second, if invariants become known at link time, we can use dynamic specialization techniques (e.g., template filling and dynamic linking) to create specialized versions.  Third, if invariants become known at run-time, we can invoke a compiler to compile and generate code for these invariants.

Use concrete scenarios (i.e., concrete system components and invariants) to compare these three different approaches to create specialized code in terms of the code generation overhead and quality of code generated.

3. Scalability

In many papers in the systems area, the authors claim that they propose a solution that scales well. A precise definition of scalability depends on the context but it could be (1) as more resources are added to the system, a proportionately larger number of requests can be handled, or (2) as more resources are added to the system, the response time for a fixed set of requests can be decreased. Consider the following two scenarios: (1) a logically centralized service that processes requests from a large number of clients, and (2) a scheme for finding resources in a peer-to-peer system. Discuss how scalability will be defined for these contexts. For each, suggest algorithms that scale well and others that exhibit poor scalability. Can you suggest an algorithm that may scale even better than the ones that exist in the literature?

4.     File Systems

A distributed system implements a core service which makes use of a heartbeat like protocol to keep track of when nodes fail and recover. If the failure or recovery of a node is detected, this information is disseminated to all nodes with the property that near by nodes learn of a node's failure more quickly than nodes that are far away. The system also implements a clock synchronization protocol to bound drift across node clocks.

You have been asked to implement a replicated file system in such a system. The file system should strive to provide as strong consistency as possible (readers must get the most recently written data).  The protocols employed by it should tolerate up to a threshold number of benign failures.  They should exploit the information that is available from the failure detection service and can also use the synchronized clocks when useful.
 
Give an outline of a protocol that can provide consistent access to file data.  Discuss how it does or does not benefit from the services provided by the system.

5.    Core OS Synchronization

There have been several proposals for higher-level interprocess/interthread synchronization (such as monitors, path expressions, and serializers).

(a) First, give the pros and cons of adopting such proposals in operating systems from the implementation point of view.  
(b) Second, discuss the relative merits of these constructs from the point of view of an application programmer.  
(c) Third, enumerate the mechanisms that are typically found in modern operating
systems such as Sun Solaris and Microsoft NT for thread synchronization.  In particular discuss why such operating systems chose to have or not to have such high-level synchronization constructs.

6.    Communication Abstractions for Distributed Systems

Interprocess communication and synchronization is an age-old topic.  The Unix socket abstraction is a convenient way for encapsulating the communication and synchronization needs of a distributed application, and has been around for a long time.  Yet, we see a number of proposals for new abstractions that encapsulate the synchronization and communication needs of parallel and distributed applications.

(a)     Why do we need new abstractions given the ubiquity of sockets?
(b)    Give concrete examples of application semantics that illustrate the limitations of sockets, and warrant the need for new abstractions (write code fragments to get your point across).

7.    Virtualization

A newly rediscovered topic in operating systems is system virtualization and isolation.  
(a)    Distinguish isolation from virtualization.
(b)    Concerning isolation and virtualization mechanisms, discuss how this
       new work builds on previous research, including on past hardware and
       software for timeshared systems.
(c)    Describe two NEW mechanisms you can think of that would be useful in
       implementing virtualization properties for certain devices (you
       Pick the specific devices),
(d)    Provide examples as to the importance of this new topic (at least
       three).

8.     Real-Time Scheduling

Define what optimality means for static vs. dynamic deadline-based real-time scheduling, i.e., what is an optimal schedule in each case? Discuss whether a reservation-based approach can be optimal in the dynamic case, then describe an adaptive scheduling approach for communications AND argue about how one might define optimality in this case and reason about it.

9.     O-O Systems

There is a long history of configurable systems in the object-oriented domain.
(a)    Why has the community focused on O-O systems for operating system design? Come up with two reasons.
(b)    What configuration opportunities exist in the O-O domain for operating    systems? Give two examples, one each for dynamic vs. static configuration.
(c)    Come up with an example of online configuration that is suitable for distributed middleware, rather than operating systems. Be technical here; don't just say 'security' or some such.