SYSTEMS  AREA PH.D. QUALIFYING  EXAMINATION
SPRING  2004

IMPORTANT INSTRUCTIONS:

1. Please attempt any six out of the following nine questions.  The exam has total of 5 pages and 9 questions.
2. Please attempt all the subparts of a question (marked (A), (B), (C) … etc.) unless otherwise stated explicitly (in these cases follow individual instructions embedded in the given question overriding this general rule).
3. Make and state assumptions wherever necessary.
4. Please keep your answers precise and to-the-point. Clarity is important.
5. Please return the answers in the printed form. If written by hand, please write cleanly and legibly.



1. Distributed Simulation

(A) Consider the Chandy/Misra/Bryant null message algorithm. As originally proposed, this algorithm assumes the topology indicating which logical process can send messages to which other processes is static.  Explain what difficulties arise when the topology is allowed to change dynamically during the execution of the distributed simulation.  Propose extensions to this algorithm to allow dynamic connections between processes.  State what constraints (if any) must be imposed on the application to realize your algorithm.

(B) Are dynamic connections a problem in Jefferson's Time Warp algorithm? Explain why or why not.

2. Protocol Construction

Academia and industry have made intensive efforts to automate protocol construction, typically using some combination of micro-protocols and of formal methods used for micro-protocol composition.
(A) Briefly outline the current state of the art, including commenting on the role of formal techniques.
(B) Talk about two basic problems that arise in micro-protocol construction: 1) there are some specific problems in generating high performance code for composed protocols vs. protocols written and compiled as single pieces of code (e.g., using a standard compiler). Identify at least two and comment on their solution. 2) there are some distinct advantages of using micro-protocol composition vs. other approaches to protocol construction. Describe at least two specific ones.




3. CPU Reservation

The idea of this question is to be a bit creative, taking the next step in the work on CPU reservation (for multimedia) to which you have been exposed. Consider the following: rather than reserving CPU for multimedia, you are asked to build a reservation mechanism that reserves MULTIPLE resources for multimedia and next, you are asked to extend that mechanism from single to multiple machines. After all, for realistic multimedia applications, performance concerns are end to end.
(A) Can you easily generalize the existing CPU reservation mechanism to dealing with multiple devices?
(B) What issues must be addressed for extending it to multiple machines? Identify and describe solutions to multiple such issues.

4. Cluster-based Web Cache System

Your new company, SIM, provides cluster-based software for ISP's that allows them to do simple firewalling and Web caching.  A customer installs a SIM caching cluster according to their expected user population size.  Under normal conditions (typical Web workloads, no undue network congestion effects, etc.), a single cluster node can handle about 60 hits per second.

Some of your more sophisticated customers tell you that they would like to be able to "augment" the caching with additional capabilities. For example, @mobi.com has many users who use PDA's who access the Web through their ISP cluster, so in addition to fetching and caching a page, they would like to be able to perform simple transformations on data, depending on whether the client is a Windows PDA, Palm PDA, or regular desktop browser (IE/Netscape).  So you agree to design an API that will allow them to insert "plug-ins" into the cache stream.

Below is the simplified control flow of your existing architecture.  Your software architecture is concurrent: a single thread handles the incoming request, but after the 'get page from web' procedure returns, the thread forks into two so that the "store in cache" and "deliver to client" code paths can be executed in parallel.

 

Please choose to answer either (A) and (B)  or (A), (C) and (D) from the following.  Write down your answers to the best of your knowledge.

(A) Redraw the diagram to indicate where a customer-supplied transformation module could fit in the control flow.  (There may be more than one place where it would work.)  

(B) In the original logic diagram above, all of your functional blocks correspond to methods inside a single process.  (The "store in cache" and "deliver to client" blocks can be executed by different threads, but they are user-level threads in the same process space.) @Mobi.com wants to know if their plug-in module will run
(a) in the same process space as a separate procedure call that is part of ,
(b) in the same process space but as a separate (forked) thread,
(c) in a different process space.  
Thoroughly enumerate the pros and cons of each approach. Consider performance, ease of programming, capacity provisioning, and failure management.  Which approach (a, b or c) would you ultimately recommend and why?

(C) One day, the recording industry companies have a vision in a dream telling them to put all of their content online for free.  Music fans immediately begin downloading MP3 files. Make and state reasonable quantitative assumptions regarding the relative sizes of a typical MP3 file compared to other Web workloads, and estimate the reduction in capacity of your cluster when serving this workload.  (Hint: Assume that under the new workload, about 80% of the bytes transferred correspond to MP3 files.  You'll need to compare this with a defensible assumption about the old workload, i.e., an assumption based on some recent data.)   Again, clearly state your data sources and path of reasoning.

(D) MP3's are a lot bigger than images and Web pages, and soon enough, @mobi.com's cache disks all fill up.  (Their system is still passing requests through, but the cache is thrashing: every time a new file is brought in from a server, it causes the eviction of an existing file that will probably be referenced again soon.)  @ mobi.com's administrator is upset, and tells you he is ready to just turn off the cluster to save electricity.  "No!" you say.  "There's still some benefit from having the cache operating, even if it is thrashing!" Please describe, to the best of your knowledge, what's the benefit, and who will benefit.

5.Kernel/User-level Interactions:

A lot of papers in the literature have concentrated on improving the interactions between a kernel scheduler and user-level communication (e.g., IPC) or synchronization (e.g., locks). For instance, the scheduler could preferentially schedule the recepient of an IPC call as soon as the thread that started the IPC is blocked. Or a user-level thread library could know that an LWP is near the end of its time slice and adjust its behavior.

Give a short survey of a few techniques along these lines. You should name at least three different techniques, citing the respective publications and explaining the essence of each technique. Discuss why these techniques are representative of other approaches, and present the tradeoffs involved.

6. Distributed  Shared  Memory:

There have been a number of papers in the late 80's and early 90's that were concerned with software distributed shared memory (DSM). DSM has been used as a programming abstraction for clusters, and as a system structuring concept in distributed operating systems such as Clouds.
Answer the following questions regarding DSM:
(A) What is appealing about DSM as a programming abstraction? Discuss the sources of performance advantages and disadvantages. Give code samples to illustrate your points.
(B) What is appealing about DSM as an operating system structuring mechanism? Discuss the sources of performance advantages and disadvantages. Give code samples to illustrate your points.
(C) What is the state of DSM research today?  Give a commentary on the
   current state of DSM research.


7. Distributed Systems

There are many distributed applications where information created at one node, called an "update", has to be disseminated to a large number of other nodes reliably. Obviously, if multicast is supported, an update can be sent as a multicast message. However, reliable multicast suffers from scalability problem because of a number of reasons. One way to improve scalability is to use "epidemic" style protocols where an update a sent to a small number of nodes which forward the update to others and so on. Answer the following questions when such an epidemic dissemination algorithm is used to propogate an update to all nodes.

    (A) If a given node wants to send an update to only a constant number of other nodes which may be chosen randomly, how many rounds will ensure that most nodes have received the update?

    (B) If updates can be injected into the system at multiple nodes, what kind of ordering should be imposed across them? How can such ordering be implemented in a large scale system?

    (C) It is possible that some of the nodes can be compromised and such nodes can try to disrupt dissemination or can even attempt to introduce corrupted updates into the system. Is it possible to handle such faulty nodes when their number is bounded? If your answer is yes, sketch a protocol that would achieve this goal and specify the bound assumed on the nodes that could be faulty.

8. Real-Time Systems

Although the 1997 Mars Pathfinder rover was generally considered a success, it had a series of resets that slowed down the mission for several weeks.  As it turned out, the resets were due to a priority inversion problem.  A high priority task (data distribution that broadcasts collected data through a shared bus) is suspended pending access to a shared resource.  As it happened, a lower priority task also uses the same resource.  When a number of intermediate priority tasks interrupted the low priority task, the high priority data distribution task misses its deadline.  Since the data distribution task is considered critical, missing its deadline causes the system to reset.

(A) Explain the general conditions for a priority inversion problem such as Pathfinder’s to happen in operating systems.

(B) Outline a solution to solve the priority inversion problem.  Explain how your solution will prevent the conditions above (item 1) from happening and how it solves the Pathfinder priority inversion problem.

9. Specialization

It is common for system programmers to optimize the common case (executed many times) by manually writing an optimized path or by specialization using tools such as a partial evaluator.  Specialization differs from manual optimization by identifying explicitly the invariants that hold for the common case and then simplifying the code under the assumption that the invariants are true.  The specialization process is complete when guards are included to detect the situations when the invariants are violated and the specialized code is no longer valid.

Our concrete example is dynamic linking of kernel and library modules, a standard technique supported by most modern operating system kernels.  A naïve implementation of dynamic linking contains inefficient code.  For example, at linking time, the system looks up the module to be linked and loads in the code to be linked.  Each time the module is invoked, the system looks up the module’s location in memory and then branches into the module.  

Explain how you can use specialization to eliminate the sources of inefficiency in the naïve implementation of dynamic linking.  
(A)  First, identify the common case.  
(B) Second identify the focus of specialization by making explicit the invariants being defined.  
(C) Third, explain how the execution path of the common case can be shortened by exploiting the explicit invariants.  
(D) Fourth, explain how you will guard the specialized code by detecting the violation of invariants and switching the execution to a general case.