Project 2: Local Feature Matching

Two images of Notre Dame taken from different persepectives. Green circles represent successful matches and red circles are unsuccessful.

The above image shows feature matching between two pictures of Notre Dame. The pictures were taken on different days from slightly different perspectives. To detect interst points, I programed a version of Harris corner detection. For local feature description, I implemented a SIFT-like descriptor which converts each interest point to a 4x4 cells, with each cell being converted to a histogram of the gradient into 8 orientations. Lastly, feature matching uses the ratio test, which compares the ratio of the distance of the first and second nearest neighbor to every feature. This process is referred to as the Nearest Neighbor Distance Ratio test (NNDR). The process to match an image pair is as follows:

  1. Find interest points (Harris corner detection)
  2. Compute each feature's local description (SIFT-like)
  3. Match feature descriptions (NNDR)

Implementation decisions

Throughout the coding process of this project, there were several decision made which have an effect on the final result. First, the image pairs were converted to black and white in an attempt to slightly normalize them and reduce the number of computations necessary with three color channels. Within get_interest_points.m and match_features.m, the threshold has a large impact on running time and the accuracy of the matches. For the sake of simplicity, match_features mostly used a value of 0.85 for Notre Dame and Mount Rushmore. Whenever that threshold proved to be ineffective, as would be the case when no matches were found, threshold was set to the mean of the nearest neighbor distance ratios. As far as the interest point detector, as detailed below, the threshold was set to return a constant amount of points. I found 15,000 interest points to be a good balance between runtime performance in time and accuracy of the final matches. This number was adjusted to allow for the code to complete in approximately 3 minutes. When generating the feature patches, I decided to omit features within the width of the patch, also for the sake of simplicity. Another option would have been to pad the image, although the potential gain in accuracy did not seem to be worth. Additionally, the width of each feature remained at the default value of 16, and orientations were binned into 8 directions. Some experimentation with different number of bins yielded faster running times as the expense of accuracy.

Results

The images on this page use the above algorithm to match interest points. The number of matched features to be evaluted was either the number of detected matches or 100, whichever was smaller. The matched features were sorted by confidence, so only the most confident features would be evaluted. For the Notre Dame pair above, 91% matches of the 100 evaluted were correct, as shown by the green circles. Due to the number of features from each image, the running time of get_features and match_features was the biggest deciding factor in selecting a value for threshold. The slowest function by far was match_features. I assume this is due to the extraordinary number of distances calculated using pdist2 by the 15k^2=225m total number of image pairs. Early versions of get_interest_points would return upwards of 50k points, resulting in feature matching taking more than ten minutes to complete. Forming the local feature description would also be slow based on the number of interest points. I hypothesize this could be improve drastically by converting the large number of nested for loops into a most sophisticated sliding window operation. This could possibly be done through the use of the built in function colfilt, although it seemed out of the scope of this project.

Example of interest point detection

The code block below is an overview of how I detect corners using the Harris corner detection algorithm. It uses the algorithm laid out in Szeliski 4.1.1 and the equation har = det(A) - a*trace(A)^2 for measuring each pixel's "cornerness". Alpha, a in the equation, was kept at 0.06 as suggested in Szeliski 4.1.1. Non-maximum suppression is performed with a sliding window and a max function utilizing colfilt. To avoid tricky border logic for getting feature patches, I decided to border the image with zeros after the har matrix was found. Lastly, the threshold was determined to take the first 15,000 points with the highest cornerness score. In the below block, extraneous lines of code were removed for readability.

% get_interest_points
% get gradients in x and y directions
[Ix, Iy] = imgradientxy(imfilter(image, small_gaussian));

% get second moment of image gradients
Ix2 = imfilter(Ix .^ 2, large_gaussian);
IxIy = imfilter(Ix .* Iy, large_gaussian);
Iy2 = imfilter(Iy .^ 2, large_gaussian);

% find cornerness values
A = [Ix2, IxIy; IxIy, Iy2];
alpha = 0.06;
har = det(A) - alpha * trace(A) .^ 2;

% non maximum suppression
har = colfilt(har, [feature_width / 4, feature_width / 4], 'sliding', @max);

% get corners which exceed some threshold
[y, x] = ind2sub(size(har), find(har >= threshold));

Examples of matching results

Above are different examples of image matching pairs. The first displays a different evaluted view of the Notre Dame pair, connecting the matched features. You can easily view where pairs of features were misaligned. The second row shows two images of Mount Rushmore along with its most confident feature matches. With the same threshold as Notre Dame, only 40 matches were found at 100% accuracy. The third row is two different images of Episcopal Gaudi. This pair is challenging for several reason and as a result it had the worst accuracy at 1% of 100 matches. I hypothesize the poor performance is due to the difference in scale and illumination. The left image is smaller and was taken on a cloudier day than the right one. After reviewing the matches, I believe more than 1 were a proper match but were not evaluated as such. The last row is an arrowed view of Episcopal Gaudi, showing the mess of matches detected.