CS4803DGC - Programming Assignment 3

School of computer science

Georgia Institute of Technology

CS4803DGC, Spring 2010
Programming Assignment #3
Due: Friday, Feb. 19, 6:00 pm
Hyesoon Kim, Instructor

Introduction
This is an individual assignment.
In this assignment, you will implement a blocked implementation of a matrix convolution. This assignment will have a constant 5x5 convolution kernel, but will have arbitrarily sizes images. To simplify the problem, image sizes are always multiple of block size, 16.

Matrix convolution is primarily used in image processing for tasks such as image enhancing, blurring, etc. A standard image convolution formula for a 5x5 convolution kernel A with matrix B is

C(i,j) = sum (m = 0 to 4) { sum(n = 0 to 4) { A[m][n] * B[i+m-2][j+n-2] } }
where 0 <= i < B.height and 0 <= j < B.width

Elements that are "outside" the matrix B, for this exercise, are treated as if they had value zero.

In this assignment, the performance optimization is required.
1) Untar lab3.tar.gz into ~/NVIDIA_GPU_Computing_SDK/C/src

Instruction:
cd ~/NVIDIA_GPU_Computing_SDK/C/src tar lab3.tar.gz cd lab3 make
2) Edit the source file 2Dconvolution.cu and 2Dconvolution_kernel.cu to complete the functionality of 2D convolution on the device.

3) Arguments:
No arguments: default image size (16x16)
Two arguments: image height, image width
You do not need to pass any images. The images are randomly generated inside the code.

4) Explain what kinds of optimization you perform. Measure the performance improvement for all your optimizations.

5) Submission:
The lab3.tar.gz file should contain the lab3 folder provided, with all the changes and additions you have made to the source code. Please include the answers to Question 4.
Instruction:
cd ~/NVIDIA_GPU_Computing_SDK/C/src/lab3 make clean cd .. tar cvf lab3.tar lab3 gzip lab3.tar upload lab3.tar.gz file at T-square
6) Grading.
Functionality (50 pts)
Performance optimizations (30 pts):
Speedup competition(20 pts)

If you use one of these optimizations, you will receive the optimization points. For different optimizations, you can submit the files with different directory names. The performance optimization scores will not be greater than 30 pts.

Shared memory usages: 20 pts
Constant memory: 10 pts
Loop unrolling: 20 pts
Texture memory: 20 pts

We will also measure the speedup of your assignment and compare with other students. Your score will be (your speedup)/(the best speedup*0.95)*20. (The max is 20).