Systems Fall 2002 Qualifier Exam

INSTRUCTIONS:


Please answer any 6 out of 9 questions

Give detailed answers to questions mentioning all the assumptions you are making and be comprehensive in your answers showing the details of your approach and understanding of the problem.

1. Describe a distributed simulation application program where the Time Warp algorithm will perform well, but the Chandy/Misra/Bryant null message algorithm will perform poorly.  Explain your answer.

Similarly, describe a distributed simulation application program where you would expect a conservative synchronization algorithm will likely outperform Time Warp.

2.  Once you remove all the window  dressings, a "parallel machine" such as the Origin 2000, a "cluster" such as our own Jedi, or a "distributed system" such as the set of machines in a geographical area (say the college of computing), each consists of a set of processors glued together by an interconnection network.

    (a) What is fundamentally different about each of these classes of machines?
    (b) What OS issues are common to all of them?
    (c) What OS issues may be relevant in one class of machine and irrelevant in another?
    (d) What are the characteristics of applications that may be appropriate for each of these classes of machines?

3.    Since nodes in distributed systems have independent failures, it is possible to make a service fault-tolerant by replicating it at several nodes.  The replicated state of the service can be kept consistent by using protocols that perform read and write operations with a majority of the replicated instances. Outline the design of a file server that replicates files at multiple nodes to enhance data availability.  Discuss how such a design may impact performance enhancing mechanisms such as caching and log-structured file systems.

Unfortunately, compromises of nodes is a more serious problem than crash failures of server nodes. A compromised node can attempt to corrupt the data as well as meta-data that is stored at the compromised server. Also, a compromise that exploits some vulnerability can use a common attack to compromise multiple instances. Assuming that the number of compromised nodes is bounded by some threshold, is it possible to design read and write protocols that provide consistency semantics similar to file systems such as Sprite and xFS? If your answer is yes, describe the protocols. Otherwise, present arguments why this may not be possible.
 
4.  Real-Time Systems Question: One of the main goals in real-time scheduling is to preserve the predictability of CPU schedules despite the execution of concurrent threads that may interfere with each other.  An important problem in this environment is called priority inversion, where a thread with higher priority may block, waiting for a lower priority thread running in a critical section.  One solution proposed for solving the priority inversion problem is called priority inheritance, where the executing thread inherits the waiting thread's (higher) priority and runs to completion of the critical section.  Explain how priority inversion may be implemented as an addition to a normal kernel (without other real-time support).  Hint: the critical section code and possibly the scheduler code may need enhancements.

5.  The basic definition of an operating system states that its role is to `provide support for applications and insulate them from underlying hardware'. Therefore, since application needs change over time, so will operating systems. A good example is the inclusion of web servers (i.e., the khttp stack) in modern OS kernels. An alternative solution, the sendfile system call, give user level web servers the ability to directly transfer a static file from a disk to a network device, without further involvement of the user-level web server, thereby eliminating unnecessary data copying and user/kernel crossings.

A)  Discuss the advantages/disadvantages of the khttp-based vs. sendfile-based OS support for web services. In particular, arguments have been made that khttp is unnecessary, since sendfile replaces it.

B)  Neither khttp nor sendfile in a server address the increasing importance of symmetric communications, as experienced in peer to peer systems (e.g., Instant Messaging). In the remainder of this question, you will design some OS support for IM:

a)  Imagine a 'transferfile'  command executed jointly by client and server, which directly transfers a file from one IM user to another (e.g., to share an image between both). Describe the necessary components of 'putfile' on the two interacting users' machines.     Note that TWO OS kernels are involved in this transfer, so the two collaborating user programs will both have to make different system calls, with the joint effect being the transfer of a file directly from one machine's disk to another one's.

b) Describe the APIs of both OS's portions of the transfer file command.

c) Discuss when and whether performance advantages can be attained by
    use of your new OS facilities vs. simply taking the same actions at
    user level.

 6.  A basic issue with middleware is its flexibility, that is, its ability to provide the types of services end user applications need. One topic concerning such flexibility is communications. This is why RMI (remote method invocation) in Java, for instance, has an underlying facility called a socket factory. Notwithstanding what exactly Java's socket factory does, describe in your own words the various communication flexibility you consider important for distributed applications written with middleware like Java. However, rather than doing this in general, be specific, by first identifying application domains, then commenting on the necessary flexibility.

 a) Comment on: - multimedia applications (identify at least  three issues)
    - scientific applications (identify two different issues)
    - publish/subscribe applications (identify at least one additional issue).

 b) Comment on how you would implement your socket factory to be sufficiently general to address the various application needs identified in a). No need for design details here, just a discussion of what's easily vs. not easily implemented at that level of abstraction.

7.  All modern computer systems use paged virtual memory with separate address spaces for every process.  Periodically, designers get bored with this pattern and propose alternatives, e.g. placing all the processes into a single address space and protecting them from each other with the per-page protection bits instead of via separate namespaces.

a.  List the advantages and disadvantages of per-process address spaces.

b.  One major issue in a single-address-space machine is how to locate and/or relocate a program in the address space.  How might you solve this problem at build time, at load time and at runtime?

c.  How would a single address space OS simplify or complicate the require hardware support for virtual memory?  I.e. what, if anything, would TLBs, caches, I/O systems, have to or be able to do differently?

8. Cluster-based Web Cache System

Your new company, SIM, provides cluster-based software for ISP's that allows them to do simple firewalling and Web caching.  A customer installs a SIM caching cluster according to their expected user population size.  Under normal conditions (typical Web workloads, no undue network congestion effects, etc.), a single cluster node can handle about 60 hits per second.

Some of your more sophisticated customers tell you that they would like to be able to "augment" the caching with additional capabilities. For example, @mobi.com has many users who use PDA's who access the Web through their ISP cluster, so in addition to fetching and caching a page, they would like to be able to perform simple transformations on data, depending on whether the client is a Windows PDA, Palm PDA, or regular desktop browser (IE/Netscape).  So you agree to design an API that will allow them to insert "plug-ins" into the cache stream.

Below is the simplified control flow of your existing architecture.  Your software architecture is concurrent: a single thread handles the incoming request, but after the 'get page from web' procedure returns, the thread forks into two so that the "store in cache" and "deliver to client" code paths can be executed in parallel.

Please answer Question A and then choose to answer one of Question’s B, C, and D.  Please write down your answers to the best of your knowledge.

Question A:
Redraw the diagram to indicate where a customer-supplied transformation module could fit in the control flow.  (There may be more than one place where it would work.)

Question B:
In the original logic diagram above, all of your functional blocks correspond to methods inside a single process.  (The "store in cache" and "deliver to client" blocks can be executed by different threads, but they are user-level threads in the same process space.) @Mobi.com wants to know if their plug-in module will run

(a) in the same process space as a separate procedure call that is part of ,
(b) in the same process space but as a separate (forked) thread,
(c) in a different process space.  

Thoroughly enumerate the pros and cons of each approach. Consider performance, ease of programming, capacity provisioning, and failure management.  Which approach (a, b or c) would you ultimately recommend and why?

Question C:
One day, the recording industry companies have a vision in a dream telling them to put all of their content online for free.  Music fans are immediately begin downloading MP3 files.

Make and state reasonable quantitative assumptions regarding the relative sizes of a typical MP3 file compared to other Web workloads, and estimate the reduction in capacity of your cluster when serving this workload.  (Hint: Assume that under the new workload, about 80% of the bytes transferred correspond to MP3 files.  You'll need to compare this with a defensible assumption about the old workload, i.e., an assumption based on some recent data.)   Again, clearly state your data sources and path of reasoning.

Question D:
MP3's are a lot bigger than images and Web pages, and soon enough, @mobi.com's cache disks all fill up.  (Their system is still passing requests through, but the cache is thrashing: every time a new file is brought in from a server, it causes the eviction of an existing file that will probably be referenced again soon.)  @ mobi.com's administrator is upset, and tells you he is ready to just turn off the cluster to save electricity.  "No!" you say.  "There's still some benefit from having the cache operating, even if it is thrashing!" Please describe, to the best of your knowledge, what's the benefit, and who will benefit.

9.  Many mechanisms have been proposed for developing Operating Systems in a modular form. Approaches range from micro-kernels and exo-kernels to the more pedestrial Linux modules mechanism.

  -What is the intended advantage of all such modularization mechanisms?
  -What is the advantage of a micro-kernel over something like the Linux modules mechanism?
  -Describe the main challenge of externalizing parts of the kernel into the application space as in the exo-kernels work.

(Hint: it has to do with safety.) What ideas have been proposed for safely externalizing parts of the kernel? How does the kernel ensure the safety of the external extensions and what kinds of safety properties are examined?