| Sponsor | Zack Kurmas |
| Area | Systems |
Background
In order to evaluate a new disk array, we must
use some workload. In other words, we must give the disk array
something to do. The results of the evaluation depend heavily on the
choice of workload.
In order to get the most accurate results, many people use a trace of a production workload. That is, they collect a trace of some disk array's activity and evaluate the new disk array using the collected trace. These traces are hard to collect, however. They tend to be large and difficult to store. In addition, for privacy reasons, people don't like to have their systems traced.
A potential solution is to use a "fake" or random workload. Such a workload, if designed properly, can be used to accurately evaluate the disk array, without the limitations of a real trace. (For a more complete comparison of real and synthetic workloads, see Generating Representative Synthetic Workloads: An Unsolved Problem Gregory R. Ganger. Appears in the Proceedings of the Computer Measurement Group (CMG) Conference, December 1995, pp. 1263-1269.) The problem is that we don't know how to accurately generate a synthetic workload.
Problem
Given some target workload, it is currently very difficult to generate
a synthetic (i.e. random) workload that behaves the same way when
executed on a disk array. This is because we don't know what
properties the two workloads must share in order to have similar
behavior.
Here are some examples of properties that workloads must share to have similar behavior:
Locality presents a special challenge because, although we know of many several methods of describing locality, we do not have a good method of generating a synthetic workload with the desired degree of locality. Your job is to design and implement a method of generating a synthetic workload with a specified degree of spatial and/or temporal locality.
I use a measure of spatial locality I call "Proximity". This metric measures four values:
Conte and Hwu proposed two measures of locality: The Inter-reference temporal distance, and the Inter-reference spatial distance. See me for a copy of the paper, if you are interested.
You are also welcome to propose your own measure of locality (spatial or temporal) and write the corresponding generator.
One obvious method is to simply brainstorm an algorithm and you are certainly welcome to take this approach. However, since this problem applies to many properties (not just locality), I would suggest trying to solve the problem using a genetic algorithm or genetic program. That way, your solution could be applied to other problems.
Deliverables
For each algorithm or genetic technique you try, you must provide