Introduction
This is an individual assignment.
In this assignment, you will implement a blocked implementation of a
matrix convolution. This assignment will have a constant 5x5
convolution kernel, but will have arbitrarily sizes images. To
simplify the problem, image sizes are always multiple of block size,
16.
Matrix convolution is primarily used in image processing for tasks
such as image enhancing, blurring, etc. A standard image convolution
formula for a 5x5 convolution kernel A with matrix B is
C(i,j) = sum (m = 0 to 4) {
sum(n = 0 to 4) {
A[m][n] * B[i+m-2][j+n-2]
}
}
where 0 <= i < B.height and 0 <= j < B.width
Elements that are "outside" the matrix B, for this exercise, are treated as if they had value zero.
In this assignment, the performance optimization is required.
1) Untar lab3.tar.gz into ~/NVIDIA_GPU_Computing_SDK/C/src
Instruction:
cd ~/NVIDIA_GPU_Computing_SDK/C/src
tar lab3.tar.gz
cd lab3
make
2) Edit the source file 2Dconvolution.cu and
2Dconvolution_kernel.cu to complete the functionality of 2D convolution on the device.
3) Arguments:
No arguments: default image size (16x16)
Two arguments: image height, image width
You do not need to pass any images. The images are randomly generated inside the code.
4) Explain what kinds of optimization you perform. Measure the performance improvement for all your optimizations.
5) Submission:
The lab3.tar.gz file should contain the lab3 folder provided, with all the changes and additions you have made to the source code. Please include the answers to Question 4.
Instruction:
cd ~/NVIDIA_GPU_Computing_SDK/C/src/lab3
make clean
cd ..
tar cvf lab3.tar lab3
gzip lab3.tar
upload lab3.tar.gz file at T-square
6) Grading.
Functionality (50 pts)
Performance optimizations (30 pts):
Speedup competition(20 pts)
If you use one of these optimizations, you will receive the optimization points. For different optimizations, you can submit the files with different directory names. The performance optimization scores will not be greater than 30 pts.