Project 2: Barrier Synchronization
Goal
The goal of this assignment is to introduce OpenMP, MPI, and barrier
synchronization concepts. You will implement several barriers using OpenMP and
MPI, and synchronize between multiple threads and machines. You may work in
groups of 2, and will document the individual contributions of each team member
in your project write-up.
General Information
- Read this assignment carefully, in its entirety, before you
start coding - it may save you a lot of time later!
- You should use a RHEL machine, but may use any such machine where you have
access to Open MP and MPI.
- Use the reference pointers below to find concrete technical information
(e.g., OpenMP and MPI tutorials and examples).
- If you have any questions, ASK! Use the newsgroup for broad questions.
(And if you see a question in the newsgroup and you know the answer -- post
it!)
Resources
Relevant reading with barrier implementation and testing:
- Umakishore Ramachandran, Gautam Shah, S. Ravikumar, and Jeyakumar
Muthukumarasamy. "Scalability Study of the
KSR-1", Parallel Supercomputing, Vol 22, 1996, 739-759.
- Mellor-Crummey, J. M. and Scott, M., "Algorithms for Scalable Synchronization on
Shared-Memory Multiprocessors ", ACM Transactions on Computer
Systems, Feb. 1991.
Usage
Compilers:
- You will need to use Red Hat Enterprise Linux (RHEL) systems for this
assignment. These instructions are specific to RHEL and will not work
completely for Red Hat 9 systems. Also, Intel's icc is for Linux and we don't
have mpich built for Solaris. Cluster machines, such as the cities and states
clusters, as well as CoC remote access machines, such as helsinki and salo are
all RHEL.
- OpenMP allows communication between processors on a multiprocessor system.
For OpenMP compilation, you should use the icc (/usr/local/bin/icc) compiler.
The most useful option for this project is -openmp. Man pages accessible by
"man -M /usr/local/intel/cce/9.0/man icc".
- MPI allows communication between different nodes in a distributed system.
For MPI compilation, see instructions on MPI introduction page. Use the
script "mpicc" found in
/net/hc280/class/cs6210/materials/mpich-1.2.7-rhel4-gcc.
- Here is a sample Makefile.
- Use the MPI compiler in following directory:/net/hp48/nova/TA_Resources/MPICH/mpich-1.2.7
to compile the code, execute mpicc code.c and to run it use mpirun -machinename [machineNameFile] -np [numbeOFProcessors] here the machineNameFile is a file containing the list of files
and the number of processors refer to the processors you need use. Please check with some example codes before starting up, it helps a lot (atleast it worked for me!). Good luck.
MPI Specifics:
- MPI requires a mechanism for launching processes on remote machines.
Traditionally, it has used rsh/rlogin, but these are being phased out in favor
of ssh/slogin (because rsh/rlogin use plain text passwords).
-
You'll need to set up a ssh key for the CoC machines. Execute 'ssh-keygen -t
dsa' If, at this stage, you provide a password (which is the secure and
recommended option), you can still do passwordless authentication using
ssh-agent. Next put the contents of your ~/.ssh/id_dsa.pub into the file
~/.ssh/authorized_keys (e.g. cat ~/.ssh/id_dsa.pub >>
~/.ssh/authorized_keys). When you log into a machine, type 'ssh-add' and it
will prompt you for your password. After that you will be able to ssh to any
machine in the CoC from this current session without entering your password.
If 'ssh-add' complains about being unable to connect to your authentication
agent, type 'eval `/usr/bin/ssh-agent -s`' and try again. Test the password
free connection by ssh-ing to another CoC machine.
Specifics
Construct:
1. Use OpenMP to implement a barrier between threads on a single
machine. Create several threads on one machine using basic OpenMP
threads. Use the OpenMP locking calls to implement barrier
synchronization between them. You can choose any barrier algorithm
you want, but implement at least two distinct algorithms.
2. Use MPI to implement synchronization between different OpenMP
machines (a barrier where each entity is an individual machine).
After all the threads on an individual machine reach the OpenMP
barrier, the machine will try to achieve the MPI barrier.
There are several
machines, and we have one OpenMP barrier on each machine to
synchronize several threads. Implement a tree-like MPI
barrier. Interface the built-in OpenMP barrier with your MPI barrier
call to synchronize a group of machines.
3. Scale the number of machines from 1 to 8.
Evaluate:
1. Write a testing harness that times many iterations of each
barrier call. Collect performance results for a different number of
machines (1-8) and a different number of threads (1-4) per machine.
Explain the results. Try unequal numbers of threads on different
machines. What is the difference when the threads are evenly
distributed on each machine? You don't have to provide a huge number
of different testing configurations, but justify and document your
choices. Remember to describe your testing hardware
configurations!
2. Compare the different algorithms you used to implement the
OpenMP barrier. Explain the results.
Deliverables
- When is it due: Feb 28, 2008
- What to turn in:
- Your OpenMP barrier implementations (at least two)
- Your MPI barrier implementations (at least a tree barrier)
- Your performance tests and related code
- Makefile
- A write-up documenting your performance results and barrier designs
- Documentation of the contributions of each team member
- A
README file including:
- What platform do you use?
- How to compile your source and run your program(s).
- Any thoughts you have on the project, including things that
work especially well or which don't work.
- How: See the generic project turnin instructions here