CS 6210 Advanced Operating Systems
Spring 2003
Project II: Performance Evaluation of Barrier Synchronization

Due: 11:59 p.m., Feb 18, 2003
(One minute prior to midnight on Tuesday, Feb. 18th.)
This project is to be completed in groups of 2-3.


Goal

The goal of this assignment is to introduce MPI/1.1 and  barrier synchronization concepts.  You will implement some barriers using various barrier implementations in the MCS paper (Mellor-Crummey Scott). You will compare these implementations with the MPI_Barrier call, both in terms of performance and scalability. Note that the paper describes distributed shared-memory barriers, but you will implement barrier algorithms for a distributed message-passing machine, without physically shared memory. You will use communication primitives. You may download the code from the FTP site mentioned in the paper, for review, but MAY NOT use any of the code. HINT: You might want to use messages where "shared variables" in the original algorithm are encountered. Reserve machines as before on EDHPC. If EDHPC 5-8 are still in use by CS 6235, you will use the scaling note to scale to the appropriate workload size. There are many common routines between the barrier algorithms and you are encouraged to "brainstorm" these in a group so that final development becomes much easier. You will
provide individual contributions of each team member in the project directory. Grading will be done a team basis.

General Information

Resources

Details

Construct: Use the MPI_Send and MPI_Recv calls to implement 1) MCS - dissemination barrier (Algorithm 9 in the paper),
2) MCS - Tournament (Algorithm 10)  3) MCS - Tree (Algorithm 11) 4) Total Exchange barrier (all2all). Re-implement these using collective communication primitives in MPI, using the MPI_Bcast broadcast primitive only.

Evaluate: Use a driver codelet that embeds 100, 000 iterations of the barrier call. Scale the number of processors from 1 to 32 (there are 4 processors on each of the 8 edhpc machines). You will collect performance results for your point-point implementation (sends and recv calls), collective communication implementation (MPI_bcast calls) for all the four barrier algorithms.

Graph: single Barrier execution cost for all the four barrier algorithms for the 1) point-point implementation 2) Collective communication implementation for 1 to 32 processors. Graph also the system barrier cost MPI_barrier with the point-point and
collective communication implementation curves.

What we need: TWO graphs. 1) Point-Point primitive barriers (four curves) + system barrier performance
                                                  2) Collective communication primitive barriers (four curves) + system barrier performance

Explain the curves based on number of messages sent, complexity of the algorithm and differences between each primitive class (point2point versus collective) and the system barrier.

Scaling Note: Since we would like to see consistent scaling, scale your benchmarks in this manner :
a) If there are N machines available and you have to scale to 4*N processes (say). Start with assigning each process to each
    of the N machines, take the next set of N processes and assign to each of the machines in a round-robin manner. Repeat    this until 4*N processes and make sure that you make measurements at each scaling datapoint ie. for each process size in the
workload.
 
 


Due Date & Turn-In Process
When: Feb 18, before midnight. This is one minute prior to midnight on Tuesday, Feb 18th. No late assignments will be accepted unless prior arrangements have been made.

Where:  /net/hc280/class/cs6210/groups/<group_name>. Please create a README file in each group directory with the names of group members. Email to help@cc if you cannot create your group directory.

What: