CS
4230/6236: Parallel and Distributed Simulation Systems
Fall
Semester, 2002
Projects
Preliminaries
Below are descriptions of suitable course projects. You may complete one of these projects, or propose your own. Each of the projects defined below, except the last one, is intended to be completed by a single individual.
There are three deliverables for each project with due dates as indicated below:
… Project Proposal: Thursday, October 17
… Checkpoint Report: Thursday, November 7
… Final Report: Thursday, December 5
The project proposal should be a description of the project you plan to complete. This will serve as a problem statement that will eventually become part of the checkpoint and final reports due later in the semester.
You are free to propose your own project. If you plan to define your own project you should see or send email to me first to discuss what you want to do. We are relatively flexible with respect to the project, so long as it involves parallel or distributed simulation in some way. In this case, the project selection write up should describe your project at a level of detail comparable to the project descriptions that follow. Be sure to indicate what milestones you will reach by the checkpoint, as well as the final report.
More than one person may select the same project. You may (and in fact are encouraged) to discuss how you will approach the project with other persons working on the same project, but all code and reports that you turn in must be entirely your own work.
Most of the projects that follow will require making some modifications to the FDK software. We will provide help and suggestions on how to modify this software to complete the project, but keep in mind that complete solutions have not been worked out.
Computer accounts have been set up for you on (1) a 16 CPU SGI Origin (nighthawk), a shared memory multiprocessor (SMP) and (2) a cluster of eight 4-CPU SMPs (the EDHPC cluster). These are the preferred platforms for project work, however you are free to use whatever computing facilities you have at your disposal. If you are taking CS 4230, you should complete this project using the SGI Origin. If you are taking CS 6236, you should use the EDHPC cluster. In the later case, you must take into account the additional complexity of the cluster machine in your project in some way. For example, communication between CPUs within the same SMP is done using shared memory, but communication between SMPs uses TCP/IP, and is much slower. This is an important factor in performance evaluation studies.
The projects discussed below use the FDK software. It is recommended you use FDK Version 3.0 and the BRTI software for all projects except Project 4. Project 4 should use version 4.0 and DRTI.
Project 1: RTI
Performance Evaluation
The goal of this project is to complete a thorough performance evaluation of the FDK software. An implementation of the FDK software already runs over (1) shared memory multiprocessors, (2) LAN/WAN networks using TCP, and (3) combinations of the above. An experimental version using high performance Myrinet switches has also been developed.
The goal of the project is to characterize RTI performance (specifically, the performance of object management and time management services). The project includes two main elements. The first is definition of suitable metrics, and creation of benchmark programs to collect data. You should define three or four interesting performance metrics concerning communication performance (e.g., communication latency and bandwidth), time management performance (e.g., number of time advances that can be performed per second of wallclock time), and overall performance (speedup). You need to define small micro-benchmark programs to measure the first two metrics, and a larger benchmark simulation application (e.g., a queueing network simulation) to evaluate overall speedup. The latter should model a relatively large system, e.g., simulate many queues within each HLA federate.
The second part of the project involves experimentation and discussion and explanation of your results. To explain your results, you may need to formulate a set of hypotheses, construct experiments to test these hypotheses, and conduct series of controlled experiments. You will be graded on your approach to attacking this problem as much as your results and conclusions. You should aim for having something interesting to say about the performance of the RTI software, and construct your experiments to provide empirical evidence to support your claims. It is important to provide explanations for the results you obtain in your final report.
If you are taking CS 6236, you should complete this project using the EDHPC cluster. You must take into consideration the architecture of the machine you are working on, and discuss how that impacts your results. For example, communication for messages that stay within the same SMP should be much faster compared communication between SMPs. You should collect data to compare performance between the two.
You must complete the following tasks for this project:
… In your project proposal, propose metrics you plan to measure, and discuss the benchmark programs you will develop to measure these quantities. Identify some key parameters you will vary (e.g., message size) in your experiments that you think will have a significant impact on performance.
… Checkpoint: You should have at least some of your benchmark programs running, and should have some preliminary performance measurements to report. Write up a plan describing the experiments you will conduct to complete the project. Emphasize what you anticipate the experiments will show.
… Final report: Your report should emphasize the conclusions you made from this exercise. Keep in mind that the data you collect is to provide evidence to support your conclusions, and should not be viewed as an end in itself. For multi-SMP experiments, be sure to compare performance of communication between processors within the same SMP, and between different SMPs.
Project 2: Message
Bundling Software
Communication protocols such as TCP/IP executing over local or wide area networks are more efficient in sending large messages than small ones. Applications such as discrete event simulations tend to send many small messages. Such applications are often more efficiently implemented by using a technique called message bundling where several outgoing messages for a specific destination are buffered, and then sent over the network as a single large message rather than several smaller messages. The goal of this project is to implement message bundling in FDK, and to evaluate its performance relative to the current implementation that does not use message bundling.
This project requires the completion of the following tasks:
… For the project proposal, you should include a description of this project, with more details explaining what you plan to do. You should also do a literature survey to find out about other work studying this issue, and the approaches others have used.
… Checkpoint: Design a message bundling scheme, and describe how you will embed it into the FDK software. The bundling software you add will most likely be added to the FM-lib communication layer, making it available for all of the RTI software using these services. You will need to create a simple benchmark program to exercise and evaluate the performance of your scheme.
… Final report: You should complete a thorough performance evaluation of your bundling software implementation and compare its performance to the original software that does not use bundling. Construct both synthetic workloads designed to show where bundling will provide maximal performance advantage, as well as workloads where it provides no advantage, or may even perform worse than the non-bundled implementation. If you are taking CS 6236, you must include some experimentation and discussion concerning the efficiency of your scheme for communication within an SMP and compare it to the effectiveness of the bundling software for communication between SMPs.
Project 3: Time
Management Software
The goal of this project is to realize a variation on FDK's time management software (TM-Kit), and compare its performance with the existing implementation. Specifically, the current FDK software computes LBTS values (a lower bound on the time stamp of future messages a processor can receive) using reduction computations. If a reduction is completed and there are no transient messages discovered, the LBTS computation is complete. If there are transient messages, the reduction computation is repeated until there are none.
You should propose an alternate time management scheme, implement it, and measure its performance. It is recommended you propose a modest variation on the current algorithm that is used (e.g., a different way of dealing with transient messages). If you are a CS 6236 student, you will need to take into account the machine architecture in your evaluation. For example, evaluate performance when the computation is performed on a single SMP, as opposed to when it is spread across multiple SMPs.
To simplify the implementation of this mechanism, you should preserve the existing interface to the time management software as much as possible.
… Checkpoint: You should write a report describing both the current algorithm that is used (in greater detail than the description above!) as well as the algorithm you intend to implement. You should also develop a benchmark program that you will use to test the performance of your implementation. This benchmark program should be operational (over the existing FDK software). Describe the benchmark program in your checkpoint report.
… Final Report: The final report should provide complete documentation of your algorithm, discuss the benchmark program and test set up, as well as all of your performance results. Explain the results (and any anomalies) you observed.
Project 4: Ownership
Management in the HLA
The problem of allowing concurrent updates to data
in parallel and distributed computing systems is well understood.
Protocols for concurrency have been developed to ensure that operations
on data conform to some definition of correctness for the computing model.
In distributed simulation it is common to refer to the process that is
responsible for updating a simulation data item as the owner of a simulation
object (attribute). An instance of an object attribute can have only one
owner at any given time. This
responsibility can be transferred between processes through ěownership
transferî protocols. This goal of this project is to design and implement
a high-performance ownership management system in the FDK DRTI. The
outcome will be augment the existing DRTI implementation as a separable
ěownership module.î For this
project, you should use version 4.0 of the FDK software (other projects and the
homework use version 3.0).
The FDK RTI software currently
does not implement HLA Ownership management services. The goal of this project is to add services conforming to
the HLA Interface Specification to implement ownership
transfer.
If you are taking CS 4230, it is sufficient to
develop and implementation of the ownership management software and demonstrate
that it is functional. If you are
taking CS 6236, you should also measure the performance of these services and
report your findings.
The ěcustomerî for this project is Dr. Thom McLean
of GTRI, who is very familiar with the FDK software and its internals, and has
specific ideas of how to add ownership management services. You will be working with Thom in the
development of this project.
…
Checkpoint: You should write
a description of the design of the ownership management software. You should also develop some benchmark
programs to exercise the software and (6236 students) evaluate its performance.
…
Final report: Include a
detailed description of your implementation, and report on the testing you
conducted to verify that it is functioning correctly. You should also include a description of any performance
measurements you made (CS 6236 students).
Project 5:
Multi-Player Game
Define and implement an interactive multi-player game (of your choosing) to execute over the FDK RTI software. The game must execute over multiple computers, and involve more than one interactive user. You may develop the game ěfrom scratchî or you may extend some existing game for execution on a distributed computing environment. You may complete this project on an individual basis, or form a small team of up to three individuals.
If you are taking CS 4230, it is sufficient to realize the multi-player game as the output of the project. If you are taking CS 6236, you should use the game to evaluate the performance of the RTI software.
Tasks for this project include:
… Project selection: Give a brief description of the game you plan to create for the course project and indicate any partners if more than one individual will work on the project.
… Checkpoint: Complete a design of the software. You should turn in a report with the software architecture describing the main modules, their function, and briefly describe their interface to other modules.
… Final report: The multi-player game should be completed, and a demo presented showing the pieces you were able to get working. All results should be documented in the final report.