Systems Area Questions for Qualifier
Fall 2001
Answer 6 of 9 questions:
1. Provide two examples of application-level functionality placed into the kernel that can substantially improve performance. Then define the constraints that must be imposed by future operating systems that attempt to make such placements routinely possible, even for end users who are not experts in how to build or protect OS kernels and other users on the same machine. You must be complete for this part of the question and mention related work.
2. Assuming that a NIC (network interface card) does not have any DMA hardware, answer the following question: (a) what is the effect of co-scheduling the NIC's actions with those of the host CPU (i.e., making sure the host's driver runs at the same time the NIC is executing), and (b) for what kinds of application needs is such co-scheduling most appropriate? Outline two classes of applications that can benefit from such co-scheduling, briefly describe the functional roles played by the NIC in these cases, and be sure to be precise about the resulting advantages. Alternatively, if you have strong reservations about this approach, be precise about its disadvantages.
3. This question asks you to outline the design parameters of what it would take to build a real-time object system for distributed platforms. Assuming a Java-based object system, define: (a) real-time object invocation (what parameters beyond the usual ones are needed?) and execution on the same JVM. (b) invocation of remote objects (in other words, real-time RMI). Then, (c), outline what would be needed to ensure real-time operation in a JVM and across multiple JVMs.
4. A simulation is said to be repeatable if multiple executions of the program with the same inputs always yield the same results. Parallel discrete event simulations synchronized using conservative or optimistic protocols may not necessarily be repeatable, even if a sequential execution of the same program is repeatable. Explain why this is the case.
Devise a scheme to guarantee repeatability. State any assumptions you are making, and argue why your scheme does guarantee repeatability. Discuss how you might implement such a scheme, minimizing the additional burden placed on the simulation application developer.
5. (Read the entire question before starting to answer.) Program specialization is similar to partial evaluation. A generic program can be specialized when there are _invariants_ that make some sequence of instructions superfluous, since they always produce the same results at the end. The idea of program specialization is to replace the sequence of instructions with the result, thus improving program performance. The main difficulty in OS specialization is that there are _quasi-invariants_ that remain true almost all of the time, but that could be invalidated in rare situations.
(a) In their SOSP'95 paper, Pu et al describe the specialization of the HP-UX file system. They specialize the read system call using the exclusive sequential access quasi-invariant, which is considered the most common case in Unix file systems. Give an example of quasi-invariant in another OS module that may improve system performance through specialization. Hint: I/O subsystems are easier candidates.
(b) To guard against the rare cases when quasi-invariants may be invalidated, we need to insert _guards_ into the system when applying program specialization. For example, in the specialization of HP-UX read system call, they inserted guards into the open system call, which may invalidate the exclusive access quasi-invariant when a second process opens the same file. A less obvious guard is inserted into the dup system call, which may produce the same result. For the example of quasi-invariant you gave in sub-question (a), give two examples of places you need to guard against quasi-invariant invalidation. If you believe there is only one guard necessary, present an argument that you don't need any other guards.
6. Polling and interrupts can both be used to deliver events, e.g. notification of the arrival of a network message in a multiprocessor. This problem explores the issues arising from using both mechanisms. (a) under what circumstances does each mechanism provide better performance than the other? (b) what does an OS have to do to make polling of an I/O device available as a user-level mechanism? (c) What does an OS have to do to make device interrupts available as a user-level mechanism? Give examples that are as concrete as possible.
7. Transaction processing systems
a. Define informally what a recoverable schedule is. What is the main property that must hold?
b. If two-phase locking is used, when can write locks be released in a recoverable schedule? When is the earliest time that a read lock can be released? Please explain.
c. Let the memory-commit of a transaction T be the point when T 's commit record is written to the in-memory log buffer. Let the disk-commit of T be the point when T 's log record is flushed to disk. With so-called group commit, T 's write locks are released after the in-memory commit. The client that submitted T is not notified of its commit until after the disk commits. Explain why a group commit system does not generate recoverable schedules. Explain why this is not a problem.
d. (advanced part): A transaction T modifies two files, X and Y, atomically. As it runs, T will generate three log records: LX: the log record for the X action, LY: the log record for the Y action, and LC: the log record for the commit record. Below are four scenarios describing what is flushed out to disk and what is not:
ii. Only log record LX is flushed to disk; the new X and Y pages are flushed.
iii. Only log records LX and LY are flushed to disk; the new X page is flushed, but the Y page is not.
iv. Log records LX, LY, LC are flushed to disk; the new X and Y pages are flushed.
For each scenario, state if it could happen under a pure undo value logging scheme, a pure redo value logging scenario, an undo/redo value logging scheme, or if the scenario could never arise.
8. In distributed systems, failures make it difficult and/or impossible to solve many problems. For example, the consensus problem allows a set of nodes to decide on a common outcome of a protocol's execution (e.g., a commit protocol decides if the results of a computation should be made permanent or undone). In a purely asynchronous system, it is impossible to reach consensus. Consider a system in which message delivery time is bounded. Furthermore, processes can execute atomic steps in which they can receive a set of messages and send messages to one or more other processors (broadcast including a message to all participants in a protocol). Can one design a consensus protocol for such a system? If your answer is yes, give the protocol and argue why it is correct. Otherwise, discuss why no such protocol exists.
9. Access control is a key component of a secure computing system. Capabilities or access control lists (ACLs) can be used to determine whether a request should be granted or denied. What are the pros and cons of using capabilities or ACLs. In many systems, ACLs allow both positive and negative access rights to be associated with subjects. Capabilities can have access rights (Hydra capabilities had a variety of such rights) but negative rights have not been found to be useful in the context of capabilities. Explain why negative rights are useful with ACLs but not with capabilities. Finally, give an example of a system where both ACLs and capabilities are used to govern access to resources.