SYSTEMS AREA Ph.D. QUALIFYING EXAMINATION FALL 2004

 

IMPORTANT INSTRUCTIONS:

 

  1. Please attempt any six out of the following nine questions. The exam has total of 5 pages and 9 questions.
  2. Please attempt all the subparts of a question (marked (A), (B), (C) É etc.) unless otherwise stated explicitly (in these cases follow individual instructions embedded in the given question overriding this general rule).
  3. Make and state assumptions wherever necessary.
  4. Please keep your answers precise and to-the-point. Clarity is important.
  5. Please return the answers in the printed form. If written by hand, please write cleanly and legibly.

 

1. Sensor networks

 

Sensor networks tend to use data-centric routing as opposed to topological routing. Answer the following questions:

(A) What necessitates data centric routing?

(B) Identify scenarios for which topological routing may be preferred and ones for which data centric routing may be preferred.

(C) Describe in detail at least one scheme for data centric routing. Critique the pros of cons of the scheme you choose to describe.

 

2. Threads

 

You are designing a threads programming library for the next generation SMP. Each processor in the SMP is a chip multiprocessor with multiple CPUs all sharing a level 2 cache. You want your library to provide:

- efficient thread scheduling and switching that takes advantage of the multiple CPUs in each processsor of the SMP

- efficient synchronization support (mutual exclusion locks)

(A) Present your design decisions along with the rationale for your design choices. You can use the many papers in the reading list on shared memory systems as a guide for answering this question.

(B) Present your answers in the form of APIs, data structures, and pseudo code for implementing the APIs.

 

3. Real-Time Systems

 

Although dynamic schedules allow more jobs to complete, static schedules are still widely used in real-time systems due to their predictability.

(A) Give a numerical example of a job schedule that illustrates the differences between a static scheduler and the rate monotonic dynamic scheduler. In the schedule, periodic jobs should have a CPU consumption and periods specified, while aperiodic jobs will have an arrival time, CPU consumption, and deadline. You may use a sporadic job server to handle non-periodic jobs in rate monotonic. In static schedules, aperiodic jobs always have lower priority than periodic jobs.

 

[Hint: One way to show the difference is to construct a job schedule that has infeasible jobs in the static schedule, and all jobs completed in the dynamic schedule.]

 

 

4. Specialization

 

It is common for system programmers to optimize the common case (executed many times) by manually writing an optimized path or by specialization using tools such as a partial evaluator. Specialization differs from manual optimization by identifying explicitly the invariants that hold for the common case and then simplifying the code under the assumption that the invariants are true. The specialization process is complete when guards are included to detect the situations when the invariants are violated and the specialized code is no longer valid.

 

One kind of ÒunnecessaryÓ overhead in kernel calls that can benefit from specialization is the repeated kernel calls within the same context. Examples include sequential file system reads to the same file and sockets writes using TCP connections.

 

Consider the TCP example, where we create a specialized network protocol code for each TCP connection.

(A) First, explain the ÒunnecessaryÓ overhead if the same (or similar) kernel calls are invoked repeatedly.

(B) Second, translate the overhead into an explicit invariant that does not change from one call to the next. (If you have multiple invariants, just pick one for this answer.)

(C) Third, explain the guards that will detect occasional changes to the invariant that would invalidate the specialized code.

 

 

5. Distributed Simulation

 

Consider the implementation of Jefferson's Time Warp algorithm on a distributed memory parallel computer (no shared memory).

(A) Show the data structures necessary to realize the Time Warp algorithm for a single logical process. Specifically, show GRAPHICALLY a detailed drawing of a sample snapshot of this data structure. Include descriptions of all important fields.

(B) Based on this data structure, write pseudo code for the operations that must take place when a positive message is received by the logical process.

(C) Show the pseudo code for the operations that must take place when a negative (anti-) message is received by a logical process.

 

6. Collective Communications

 

Collective communications have a rich history in operating systems, ranging from applications in high performance computing to those in distributed systems (often called group communications in that domain).

(A) Describe two specific examples of collective communications, one in the high performance domain, the other in the domain of group communications in distributed systems.

(B) For each example, describe the performance requirements imposed by applications and by the underlying systems on which they are used. Toward these ends, be sure to outline the hardware platforms to which your communication constructs are mapped.

(C) For each example, outline a reasonable implementation.

(D) Finally, critique your implementation by identifying at least one limitation each and suggesting possible solutions.

 

7. CPU Reservation

 

A new, important domain in real-time or embedded systems is that of online power management. The idea to be pursued in this question is to explore a new scheduling algorithm that uses power consumption as its principal metric, in conjunction with real-time constraints. Specifically, using the idea of CPU reservations as a basis, construct a 'power' scheduling algorithm, making the following assumptions:

- schedule tasks such that their real-time requirements are met (as in CPU reservations);

- also schedule tasks such that their device usage is 'clustered', the idea being that we would like to make sure that devices are turned on or off for as long as possible; if a task uses a device, then it will be turned on (automatically); if a device is idle for some time, it is turned off (also automatically); a power-saving schedule, therefore, is one that (1) extends device on or off times and (2) minimizes the number of transitions between device on/off times.

 

(A) Describe the additional information about each task you require for your energy saving CPU reservation algorithm, specifically identifying at least two devices you will address with your work.

(B) Describe your new reservation algorithm (and the assumptions you are making about the lower-level CPU scheduler that uses it).

(C) Talk about the complexity of your scheduling algorithm.

(D) Discuss your approach's limitations (i.e., the assumptions you had to make), relax at least one of your assumptions, and outline how you might improve your approach and algorithm given that relaxed assumption.

 

8. Middleware

 

Middleware is an increasingly prevalent medium for large-scale program development. The purpose of this question is to explore middleware that diverges from existing object-based solutions (e.g., Websphere, JBOSS, etc.) to address the pervasive domain. In that domain, it is crucial that middleware support the idea of 'overlay networks', meaning that application-level communication end points are not directly connected via some lower-level communication construct (e.g., sockets), but instead, are connected via additional, middleware-level and -provided servers that run certain middleware (e.g., data routing) or application (e.g., data manipulation) functions. The purpose of these overlays, for pervasive systems, is to ensure that end points receive the services they need, despite low-level changes in the computing platform (e.g., inability to reach another machine via a wireless connection).

 

(A) Assuming that the network level will do its best to find some route across communicating machines in an adhoc wireless environment, define a specific problem middleware will experience in that environment when using overlay networks.

(B) For your specific problem, devise a solution that can guarantee some level of service to applications despite underlying platform changes. You must (1) identify the platform changes addressed, (2) identify all of the steps taken by your solution (e.g., process migration?), including the way in which your middleware finds out about platform change, (3) identify the application-level guarantees you are trying to provide with your solution, and (4) identify at least two problems with your solution approach.

(C) Discuss in some detail your solution steps, including technical issues with your approach, such as problems with e.g., process migration. Note that your approach does not have to use process migration; we are simply using that as an example here.

 

9. Distributed File Systems and WWW

 

Network file systems such as the Andrew File System (AFS) and the World Wide Web (WWW) both provide users access to remote files, but the two systems have very different user interfaces. In network file systems, the user only needs to mount the remote file system onto the local machine. Then he or she can access remote files just as if they were local. In WWW, a "browser" sends an "address" (a URL) of the file to a Web Server and displays the result to the user.

 

Please choose to answer any two of the three questions. Write down your answers to the best of your knowledge.

 

(A) List the major differences between the two systems' services. In particular, point out their differences with respect to

 

           (B) Since the two systems provide different services to the user, naturally, their implementations are different.

(i)             Briefly outline the main features of the AFS file system implementation.

(ii)           Suggest an algorithm for implementing the WWW. You do not have to describe

how a browser implements the display of the Web documents, but you should suggest a framework for maintaining a cache of recently-accessed documents to decrease network traffic and for making sure the documents in the cache are up-to-date. If you happen to know the WWW actually does it, you can describe the algorithm. Otherwise, you can make up a plausible algorithm of your own.

 

(C) If AFS were universally available, how would that simplify the task of implementing the WWW?