Project 2: Local Feature Matching

For this assignment, I implemented algorithms for interest point detection and feature descriptor creation and matching in images. These algorithms are crucial in tasks such as object recognition, image alignment, and 3D reconstruction. The three algorithms that we had to implement were:

  1. A Harris detector for finding corners in an image
  2. The development of SIFT(Scale-invariant feature transform)-like features to describe the patch around an interest point
  3. The nearest neighbor distance ratio for calculating how closely two feature descriptors in different images match.

The Harris detector finds corners by locating points in an image where moving in any direction results in a significant change in the intensity level of pixels. This is done by taking the horizontal and vertical gradients of an image and multiplying them together in every combination possible such as (Ix)^2, (Iy)^2, and (Ix*Iy). Gaussian filters are applied to all of these products, and a cornerness score that detects gradient changes is calculated through the following short-hand formula:

(g(x^2) * g(y^2)) - (g(xy)^2) - (alpha * (g(x^2) + g(y^2))^2),

where alpha = 0.06. If the cornerness score for a particular pixel is above a particular positive threshold (for example, 0.01) and there are no pixels in its immediate vicinity with a greater cornerness score, then that pixel is considered a key corner feature point.

SIFT-like features were then calculated by taking a square patch around each interest point (usually of width 16) and taking horizontal and vertical Sobel derivatives that were weighted with a Gaussian of variance half the feature width. A 4 x 4 array with each cell containing 8 histogram bins was then used to organize the gradients in each quadrant by their orientation, with each gradient's magnitude being added to the appropriate bin. All of the histograms' bins were then concatenated into one big 128 x 1 vector which would describe the area around that particular interest point. The vector was then normalized and had all of its gradients greater than 0.2 clamped down to be at that threshold to offset the influence of excess light coming in from multiple directions. The vector was then normalized once again.

I then proceeded to try and find the features in the two images that were the best matches to each other. For each feature descriptor in the first image, I found the two feature descriptors in the second image whose vectors had the smallest Euclidean distances from the first image's feature descriptor vector. To evaluate the confidence of the match, I divided the nearest neighbor's distance by the next nearest neighbor's distance to yield a nearest neighbor distance ratio. The smaller the ratio, the more likely it was that a true match had been found.

This pipeline of algorithms was applied to 3 image pairs featuring the Notre Dame Cathedral, Mount Rushmore, and the Episcopal Palace of Astorga.

The results of the code with the cornerness threshold = 0.01 were as follows:
Image Pair Interest Points Found Matching of features Overall Accuracy of Matching (good matches shown in green, bad ones in red) Percentage of Accuracy
Notre Dame 84%
Mount Rushmore 89%
Episcopal Palace 2%

The results above make it clear that the pipeline worked best in terms of accurate matches with the Notre Dame and Mount Rushmore images, and not so well with the Episcopal Gaudi images. The Episcopal Gaudi test pair did seem to be successful in procuring some good interest points, but I believe that the drastic contrast in accuracy could be due to the fact that the difference in scaling between these two images was much bigger in comparison to the difference in scale between the other image pairs. Therfore, an attempt to make a feature descriptor out of 16 x 16 patches around the interest points of each image would yield vastly differing results. A little bit of work on making the feature points scale-invariant would have probably yielded more accurate results.

Another thing to note about the algorithm is that the process took a significant amount of time, usually 5-8 minutes. I did attempt to speed this up by increasing the cornerness threshold as this would result in less features which would have to be matched, but this resulted in lower accuracies. For example, an increase in threshold to 0.05 resulted in 80% for the Notre Dame pair, 47% accuracy for the Mount Rushmore pair, and 2% accuracy for the Episcopal Gaudi pair. Even though an increased threshold gave stronger corners, it appeared to yield worse matches.