Project 2: Local Feature Matching

Finding Interest Points

To find interest points, I used Harris Corner detection algorithm. The first step was to get the values of the Second Moment matrix. The image derivatives Ix and Iy were found by convolving the image with a Sobel filter and the transpose of the Sobel filter. IxIy was found by running a y-axis Sobel filter on Ix. The Gaussian-filtered values were then obtained by running a Gaussian filter over the image derivatives. Those values were passed into the Harris Cornerness function to obtain the Harris matrix, which was padded around with a feature-width of zeros. With the Harris matrix found, non-maximum supression was done by decomposing the Harris matrix into connected components, where the non-maximum interest points in each component were supressed.

Describing Features

The interest points were quantified with SIFT descriptors. The image was first normalized with a Gaussian blur. Then for each interest point, a window of the image was cut around the point and split into 4x4 cells, each weighted by a Gaussian distribution. For each of the 16 cells, we want to obtain the gradient frequency for each of 8 directional bins. This is done first defining the 8 directions to be sectioned by pi/4. We calculate the gradient magnitude and gradient direction for each pixel of the cell, determine which of the 8 directions the gradient direction falls under, and add the gradient magnitude to that directional bin.

Matching Features

Determining if two features matched was simply finding the Euclidean distances of the two SIFT descriptors and comparing the Nearest Neighbor Ratios of them. If the ratio was close to 1, then that means the feature is not very unique and the match is less certain.

Results

The Gaudi results gave about an 81% accuracy with the final code. Without the SIFT descriptors, and using the feature window as the descriptor gave about 82% accuracy with a 70% feature matching confidence threshold. The reason the feature window was acceptable for the Gaudi images was that there were very low scale or color variance across the images. Also, applying a weighted Gaussian across the cells for the SIFT descriptors gave a lower accuracy than without. Having a smaller standard deviation value for the weighted Gaussian lowered accuracy even more. This is likely because the Gaudi images have feature points that depend more on relative positioning, in which case scaling the feature or descriptor down to the center of the interest point would lower the accuracy of the match.

The Rushmore results were more promising than the Gaudi results with a satisfying 96% accuracy. Athough the color variance was a bit more significant across the Rushmore images than the Gaudi images, the feature matching still did well, mostly thanks to the SIFT descriptors. Although the mountain texture seemed repetative, it turned out that the Rushmore features were at least more distinct than the Gaudi structures.

The Notredame results came out to be a horrendous 0%. The images were drastically harder to match than the previous two because of the high color, scale, and orientation variance. To improve the current code so that it could better match the Notredame image, I would need to implement scale selection or keypoint orientation into the interest point detection process.