Project 2: Local Feature Matching

For this project I followed the suggested implementation process given in the project description. The first step was to implement feature matching using the ratio test to determine the confidence in the matches. This was a fairly straight forward implementation. I chose to hardcode a limit of 100 matches since the project grading criteria was based on this, but in practical use the confidences would need to be thresholded to remove unlikely matches.

For the next step I implemented SIFT feature descriptions. I first preprocessed the image by applying a large gaussian filter to the entire image. I then found matrices with the gradient directions and magnitudes by using sobel filters in the horizontal and vertical directions. I transformed the direction information into a matrix holding values with the range [1, 8] corresponding to the 8 histogram buckets that would be used to describe the patches. For each of the interest points I took a 16x16 window around the point and broke it into 4x4 patches. For each patch I computed a histogram with direction information weighted by magnitude. The magnitudes were down-weighted with another gaussian filter to place more emphasis on the center points. This left me with features with 128 attribues (16 patches with 8 orientations each). Finally, I further processed the features in a few ways. First, I normalized to unit length. Then I capped the values at .2 and renormalized (this is said to reduce outlier effects in image lighting). Finally, I raised the entries in each feature to a power slightly lower than one.

The last step was to implement the Harris corner detector. My first step here was to again apply a large gaussian filter to blur the entire image. I then found the gradients in the x and y directions, again by using the sobel filters. From that I computed the squared graidents and x-y gradient, with a gaussian filter applied for further processing. This was all that was needed to produce the raw Harris "cornerness" values. The Harris matrix needed to be further processed to make it usable, however. First I supressed all values near the edge of image, for simplicity's sake. Next, I set a threshold based on the maximum value in the matrix. Finally, I used the recommended colfilt function to do non-maximum suppression. Although connected component analysis may have been more effective for some images, the colfilt function performed essentially the same operation with much easier implementation.

I was quite satisfied with my results for the Notre Dame and Mount Rushmore images, with my algorithms scoring 94% on both. The Gaudi image did not perform nearly as well, with only 6% accuracy. I tried to tinker with the parameters to get a higher performance, but I was never able to do better than 6%.

As part of my analysis I ran the code on the Notre Dame image set without certain features implemented. When the image was not preprocessed with a gaussian filter, the accuracy dropped to 91%. When the features were not processed with a cap of .2 and then raised the small power, the accuracy dropped to 93%. When the magnitudes of the image patches were not processed with a gaussian, the accuracy dropped to 93%. These were all small reductions, but they go to show the importance of each of the implementation steps on an aggregate level. It is interesting how much computer vision can be an art more than a science.

Below are some examples of the matches that my code came up with. Scroll to the bottom for some bonus images of my adorable dog.

94% accuracy on Notre Dame images

94% accuracy on Mount Rushmore images

Very, very poor accuracy on Gaudi images. I imagine that more sophisticated pre processing and feature processing would be needed to make this work. Perhaps some spacial consideration during interest point selection as well.

By inspection 22/25 matches are correct on my dog. Two of the three mistakes were due to shadows.