School of computer science

Georgia Institute of Technology

CS4803DGC, Spring 2009
Assignment #4
Due: Thursday, March 29, 6:00 pm
Hyesoon Kim, Instructor

Introduction
This is an individual assignment.
In this assignment, you will implement a blocked implementation of a matrix convolution. This assignment will have a constant 5x5 convolution kernel, but will have arbitrarily sizes images. To simplify the problem, image sizes are always multiple of block size, 16.

Matrix convolution is primarily used in image processing for tasks such as image enhancing, blurring, etc. A standard image convolution formula for a 5x5 convolution kernel A with matrix B is

C(i,j) = sum (m = 0 to 4) { sum(n = 0 to 4) { A[m][n] * B[i+m-2][j+n-2] } }
where 0 <= i < B.height and 0 <= j < B.width

Elements that are "outside" the matrix B, for this exercise, are treated as if they had value zero.

In this assignment, the performance optimization is required.
1) Untar hw4.tar.gz into ~/NVIDIA_CUDA_SDK/projects

Instruction:
cd ~/NVIDIA_CUDA_SDK/projects
tar hw4.tar.gz
cd hw4
make

2) Edit the source file 2Dconvolution.cu and 2Dconvolution_kernel.cu to complete the functionality of 2D convolution on the device.

3) Arguments:
No arguments: default image size (16x16)
Two arguments: image height, image width
You do not need to pass any images. The images are randomly generated inside the code.

3) Explain what kinds of optimization you perform. Measure the performance improvement for all your optimizations.

4) Submission:
The hw4.tar.gz file should contain the hw4 folder provided, with all the changes and additions you have made to the source code. Include a pdf file for question 4.
Instruction:
cd ~/NVIDIA_CUDA_SDK/projects/hw4
make clean
cd ..
tar cvf hw4.tar hw4
gzip hw4.tar
upload hw4.tar.gz file at T-square

5) Grading.
Functionality 50 pts
Performance optimizations (50 pts):

If you use one of these optimizations, you will receive the optimization points. For different optimizations, you can submit the files with different directory names. The performance optimization scores will not be greater than 50 pts.


We will measure the speedup of your assignment. The best performing files should be under hw4 directory. The top 5 students will receive extra 20 points.