Special Project Ideas for CS 6210, Fall 2008
To be eligible for doing any special project, you must complete your
first project successfully (A), and you must demonstrate strong
background and prior experience in Systems. The list of projects is not
exclusive. You may discuss any ideas with the Systems faculty or PhD
students, and request approval from the CS6210 Instructors for a special
project you propose as a result. If you are already involved in research
or project for another course (including CS8903) on a systems or
systems-related topic (e.g., architecture, compilers), it may be
possible to define your CS6210 project in a complementary way, but
please be sure that your CS6210 effort is separable with separably
demonstrated deliverables - pending Instructor approval. You may work on
the Special Project alone or in pairs of two. A group size of 3 might be
suitable in exceptional cases - again, pending Instructor approval.
For more ideas, see also the Special Project list from
Fall07
and
Fall06.
I. Enterprise Middleware
One of the outputs of our group’s research is an efficient middleware
for moving large data volumes across both local and wide area systems.
The lower level of this middleware, termed EVPath, does not offer a
strong programming model. Higher levels offer multiple models, including
publish/subscribe, SQL-like queries, and I/O graphs (specialized for
high performance I/O).
1.a. There is room for creating new models, including:
-
new notions of publish/subscribe, evaluated with representative
applications in the mobile domain, including the use of interesting
end devices, such as Linux cellphones, gumstix small Linux-based
devices, etc.;
-
efficient implementation of interesting operators like MapReduce,
using EVPath and to scalably run on cluster machines; or reliably
redundant graphs, or others; and
-
efficient implementation of certain wide area functionality, like
FTP.
1.b. There is room for improving existing models, such as the iFLOW
query-based programming model of EVPath. The way to start such work is
to construct sample iFLOW applications, then use those experiences to
make select improvements. One application of particular relevance to
systems researchers concerns the use of iFLOW for system and/or source
monitoring (e.g., create an iFLOW graph that monitors network and system
behavior) and at the same time, interprets such data to deduce
interesting global properties.
1.c. Innovative uses of data streaming middleware like iFLOW, such as a
use that continuously checks for data validity to better deal with
errors introduced by unreliable system behaviors.
2. A key problem with modern systems is that the additional flexibilities
introduced by modern technologies like service-based computing, J2EE,
and system level virtualization are making systems increasingly dynamic.
As a result, there is a need for sophisticated methods for interpreting
and understanding the runtime behavior of such systems and then, perhaps
controlling that behavior. A concrete example is an external entity
calling your service, but doing so in unforeseen ways (e.g., high call
rates, strange parameters, etc.). Develop methods for modeling such
system behaviors (system characterization, behavior detection) and in
addition, create some runtime mechanisms integrated into existing
middleware (e.g., IBM’s websphere, JBOSS, ....) to control behaviors to
stay within application-desired constraints (performance isolation,
behavior isolation).
3. Get involved in ongoing work on efficient data routing and processing
across middleware overlays, (1) for mobile environments, (2) for wide
area systems.
II. High Performance Systems
The HECURA project on high performance I/O being undertaken by our
group is concerned with how to efficiently move data in and out of
large-scale parallel machines, i.e., those used at the DOE National
Labs. The idea of this project is to create rich methods for operating
on data as it is being moved to/from storage. This is done with an
improved I/O interface (not just file read/writes) termed DataTaps, with
a data streaming overlay, termed I/O graphs, and with interesting
storage backends (i.e., not just a file system, but the LWFS (light
weight file system) object-based store. There is room in this project
for a variety of interesting efforts.
Ideas include:
-
Enrich the implementation of datataps to permit online changes to
what and how data is extracted from/injected into the running program,
to enable dynamic ttradeoffs in the amounts of data extracted and the
performance implications of these actions.
-
Interface datataps with (or use them `underneath’) existing
interfaces like MPI-IO and/or with a new interface developed by our
group, jointly with DOE scientists, termed AIO; an interesting
opportunity here is to then be able to continuously monitor the program
actions that use these interfaces and then use that information to
better perform I/O (e.g., don’t do I/O when the program itself does MPI
collective operations).
-
Generate I/O graphs from higher level descriptions (e.g., see the
query graphs above) but focused on descriptions that make sense to
scientists or engineers.
-
Construct I/O graphs that compress data or uncompress it, using
methods like difference compression, encrypt it, etc.
-
Create I/O graphs for entirely different purposes, such as to watch
continuous RSS feeds.
-
Look at the extension of I/O graphs across wide area systems, which
requires you to develop methods for data staging and prestaging (another
student has started to work on this problem already), and/or for
network-aware data movement (using online network monitoring methods),
and/or for network aware data routing across distributed overlays.
-
Consider the performance implications of using datataps and I/O graphs
for large-scale parallel machines. specifically, GTC Fusion is a
parallel application running on a large number of "compute" nodes, and
performing periodic writes of large data sizes to a fairly smaller
number of "I/O nodes", which are responsible for further writing the
data out to storage. The application is running on a local cluster with
Infiniband interconnect. Perform detailed performance analysis to
determine the tradeoffs between data sizes, frequency of write
operations, overheads of write operations, total throughput, etc. with
the current policy. Design and implement an algorithm to schedule the IO
operation of individual "compute" nodes to improve the current behavior.
Ideally, you want to be able to deduce information such as "If I write
out x amount of data, with frequency y, the system performance will
suffer by n%".
-
consider lower levels of the implementation of datataps and I/O
graphs, where we are using Infiniband infrastructures and their
accompanying software stacks, which maintain large amount of information
about the network performance for each link in the network. Your
objective is to develop a distributed platform monitor, which augments
this information with performance monitoring data on individual platform
nodes - CPU loads, memory utilization, number of tasks, etc. As a next
step, you can use this information to perform certain scheduling or load
balancing decisions. The choice of target workloads may include some of
our local existing applications, or others.
-
Integrate middleware like this into other infrastructures used by
science end users. An example is Kepler, a Java application which is the
predominant scientific workflow system for high-performance systems.
However, it relies on file operations to perform workflow management
operations. Your objective is to integrate this system with a local
event-based middleware - EVPath, which can help replace the file
operations. Your would be responsible for determining how to implement
as seamless a merge of EVPath and Kepler's connectors setup and then
demonstrate 2-way connections (Kepler into EVPath and EVPath into
Kepler), as well as performance assessment of the benefits of avoiding
the file-based operation.
Contact: Jay Lofstead (lofstead@cc)
III. Pervasive Systems
1. The project would involve fine tuning and extending the services
provided by the Chimera Key based Routing Protocol:
http://current.cs.ucsb.edu/projects/chimera/
It is based on an existing Distributed Hash table layer on top of basic
Chimera p2p, which needs to be further optimized, and extended with
additional functionality, including caching, handling new node arrivals,
node/network failures, etc. The implementation should be completely in
C.
Reference reading:
The chord paper:
http://pdos.csail.mit.edu/chord/
The Pastry paper:
http://research.microsoft.com/~antr/PASTRY/
The PAST paper:
http://freepastry.rice.edu/PAST/default.htm
2. Propose, develop and analyze an application based on the VMedia
platform:
http://www.cc.gatech.edu/~paiankur/vmedia.pdf
3. Propose, develop and analyze an application based on the VStore
platform:
http://www.cc.gatech.edu/~paiankur/vstore.pdf.
For all ideas, contact Ankur (paiankur at gatech . edu)
IV. Kernel Virtualization
-
This project will involve analyzing dynamic voltage and frequency
scaling (DVFS) algorithms and their effects on workloads. There are many
strategies that are used for power management and it is often not clear
which would be the optimal strategy to use given certain hardware and
workloads to run. This decision will be influenced by the amount of
power saved and the performance degradation experienced due to each
strategy. The project will consist of modeling these effects and using
runtime data to decide the best strategy to use. Linux and/or Xen
hacking experience useful.
Contact: Hrishikesh Amur, (amur at gatech.edu)
-
Multiple projects can be defined related to runtime monitoring of
virualized platforms through the use of a specialized databased
currently deployed in one of our labs: e.g., you may focus on gathering
VM- or ACPI-level runtime information, for a rich set of workloads,
build profiles and drive management, e.g., migration operations. You can
extend the existing SNMP-based interface to the server or build a
lightweight event-based interface based on EVPath.
Contact: Hrishikesh Amur (amur at gatech.edu)
-
Booting Linux as controller on Cellule - Cellule is a virtualized IBM
Cell based software system that includes a hypervisor (rHype from IBM)
running on the Cell processor, a controller for accepting user commands and
booting new guest domains or partitions and SEE (specialized execution
environment) which is a thin OS container for running Cell applications.
Contact Vishaka (vishaka at cc).
-
Developing an IO partition capable of talking to the host (with Cell as
accelerator and x86 multicore as host) over PCIe and/or Infiniband. If not
done from ground up, this will depend on the capability to boot Linux as a
controller which is the previous project. So if 3-4 people take up both the
tasks combined, it can be done.
Contact Vishaka (vishaka at cc).
-
Using a hardware simulator experiment with various caching
architectures and scheduling policies using realistic workloads.
Contact Min - minlee at cc.
-
Multicore resource management and monitoring in virtualized
environments with xen and/or vmware esx.
Contact Mukil - mukil at cc.
-
Investigate fast inter-VM communication using asynchronous RDMA operations,
using InfiniBand hardware.
Contact Adit - adit262 at cc.
-
Extend vstore with additional capabilities.
http://www.cc.gatech.edu/~paiankur/vstore.pdf
(contact Ankur - paiankur at gatech.edu)
-
Explore the impact of changing a VM's network bandwidth
allocation on its memory pressure. For example, if you increase the
bandwidth allocated in the VM then it means that more packets need to be
buffered at the kernel at both the transmit and receive paths.
This increases the memory pressure on the VM.
The goal of this project is to find the
correlation between the change in bandwidth and change in memory pressure.
(Contact Mukil: mukil at cc)
-
Investigate guaranteeing a minimum page fault rate to a VM. The guest OS
swaps out a page from main memory when it is running out of space. If the
memory demand is far greater than the allocated memory, the guest OS will
be forced to swap out a frequently used page which will again be brought
back into main memory soon after. Our goal is to notice these undesirable
trends by measuring the time between a page eviction and subsequent
re-entry on page fault, and increase the VM memory allocation to keep the
page fault rate low.
(Contact Mukil: mukil at cc)
-
Build a tool to convert a Xen VM to a VMware compatible VM.
Xen VMs are paravirtualized while VMware VMs run unmodified.
You will also need to generate an ovf file (open virtual format)
at the end of the conversion process to aid in future conversions. OVF is
designed exclusively for the purpose of specifying VM resources in
an open format so that conversion between different virtualization
technologies is simpler.