Systems Area Questions for Qualifier
Spring 2002
April 11, 2002
Answer 6 of 8 questions:
1. Compare and contrast the relative merits of
(a) monolithic kernels,
(b) microkernels,
(c) exokernels,
(d) extensible kernels (e.g. Spin).
Your answers will be judged by the breadth of issues you raise as well as how thoroughly you address each issue.
2. This question concerns resource reservation and its relationship to resource scheduling.
3. System specialization, adaptation, and other dynamic resource management rely on the accurate measurement of currently available system resources and of the current resource needs of system services or applications. For a parallel program, describe in some detail the importance of synchronization for the performance of a parallel program. Then describe how one could instrument the locking/synchronization constructs used by parallel programs, specifically tracing and sampling. Discuss the overheads implied by tracing vs. sampling, while also describing which program improvements are possible with one vs. the other technique (use sample specific improvements). As a final note, distinguish the overheads of static vs. dynamic program instrumentation in reference to the Paradyn performance tools.
4. (a) Parallel machines exemplified by Cray T3 series, KSR, TMC's CM-5, etc., were the workhorses of choice for high performance computing in the 90s. In the context of shared memory parallel machines, discuss the operating systems issues that are critical for such high performance applications. Discuss the solution space for these issues and your own assessment of these solutions.
(b) This decade is seeing a turn towards cluster machines as the workhorses for high performance computing. What are the reasons for this trend? Are the operating systems issues different or same for this class of machines in comparison to the parallel machines of the 90's? Explain with concrete examples of issues and solutions to justify your answer.
5. A distributed file system (DFS) is an important service provided by current operating systems. Recent issues that have been explored in the context of distributed file systems include mobility and disconnection, cooperative caching and security of file data.
Two technologies that have been in the news include peer-to-peer (P2P) and network attached storage. In this question, you should explore a distributed file system design when some of the nodes are simply "network attached storage" boxes, and there are no designated server nodes. Clearly, you cannot assume that file names can be associated with fixed server nodes (using a service like NFS's automounter). Instead, you may have to locate a node that can service requests for data blocks for a file that is of interest to an application. Also, how would you best optimize the performance of file operations (reads and writes) in such a system. Finally, address the access control problem for files in such a system.
6. OS Specialization Question
Although some operating systems such as Unix strive to have simple interfaces, there are legacy API compatibility reasons that lead some operating systems to have multiple interfaces to the same underlying kernel facilities. Consider a hypothetical situation where multiple APIs have been used to access the same file system. (For concreteness, you may think of the possible paths to access FAT32 from Windows 95, 98, NT and Windows 2000.) Normally, there is only one operating system installed, so there is no interference problem among the multiple APIs. Your design team has been assigned to design a virtual machine operating system on top of which these multiple versions of legacy operating systems will run concurrently.
Subproblem 1: Your first task is to design a synchronization mechanism among the multiple APIs to the same file system, so the file accesses remain consistent. You need to minimize the changes to the individual API implementations due to the need to keep each API implementation as independent as possible. Hint: think about concurrency control.
Subproblem 2: Your second task is to specialize the execution path when there is only one operating system accessing a given file. (This is the invariant you are going to use.) Explain the code that you will eliminate in this case, how you will guard the invariant, and what actions you will take when the guard is triggered (in case the invariant becomes violated).
7. This question explores scheduling in a multiprocessor for issues other than CPU time. For instance cache affinity scheduling (Squillante/Lazowska) adds information about cache state. As a concrete running example, the Jedi cluster machines in the systems lab are 8-way multiprocessors but are constructed in two 4-way halves where four processors share a system bus. These busses are often a bottleneck in practice and the Linux scheduler is oblivious to them (or to cache affinity).
a. Give an example scenario in which a Jedi node performs suboptimally because of the two busses.
b. Propose a scheduling algorithm, e.g. an extension to cache affinity scheduling, that addresses the bus bottleneck issue. What extra information does the scheduler need? What potential drawbacks are there to scheduling in this way?
c. Multiprocessor schedulers can be centralized or distributed based on whether they rely on global information or only local information. How would your characterize your algorithm? What are the tradeoffs in this scheduling problem between centralized vs. distributed implementations?
8. The Optimistic Active Message paper proposes executing RPC handlers as Active Message handlers when possible.
a. What is the difference between an Active Message handler and an RPC handler? Why would Active Messages be faster? Why can some RPCs be cast as Active Messages and others can't?
b. The paper covers only certain, restricted cases of RPCs they don't deal with completely general RPC handlers. How could you deal with general RPC handlers in order to deploy their system?