Project 2: Local Feature Matching

Get Interest Points Algorithm

The first part of this project involved finding interest points using the Harris corner detector. In order to complete this I first found the gradient of my image in the x and y directions. I then computed the three images corresponding to the outer product of these gradients. In order to create the matrix A, as described in our book, we only need to compute 3 images because our matrix A is symmetric. Once I computed these products, I convolved each image with a Gaussian filter. The size of Gaussian filter and the sigma paramter both greatly effected the results of my algorithm. I found that by using a smaller Gaussian and considering the gradients of points very close to my own, I was able to create more precise features. That said, using a larger Gaussian had its own benefits of having more spread out interest points detected. I found that by using a filter size of 3 with a sigma of 2 I was able to avoid overfitting to the Notre Dame pair and generalize my algorithm to more images. A future interesting experiment would be to create ground truth points for more complex images and see if using a larger Gaussian filter would improve results overall even if they decreased my accuracy for the three ground truth images we were provided. The steps are shown and consolidated below:


[Gx, Gy] = imgradientxy(image);

gaussian_filter = fspecial('Gaussian', 3, 2);
topleft = imfilter(Gx.^2, gaussian_filter);
botleft = imfilter(Gy.^2, gaussian_filter);
diag = imfilter(Gx.*Gy, gaussian_filter);

Now our variables represent the position of each image in the matrix, and we now need to compute a scalar interest measure. Our scalar interest measure is the one proposed by Harris, which uses the value of our convoluted A matrix at each point to compute a score for that point by factoring in the matrix's determinant, trace, and an alpha value which we set to 0.06 (as specified in the book). Below is the code which performs this process of computing a scalar interest measure for each point:


alpha = 0.06;

harris_interest_values = size(topleft);
for i = 1:size(topleft, 1)
    for j = 1:size(topleft, 2)
        A = [topleft(i,j) diag(i,j); diag(i,j) botleft(i,j)];
        harris_interest_values(i, j) = det(A) - alpha * trace(A);
    end
end

Finally we must find the local maximum values of each score and then perform non-maximum supression in order to only consider points which are a local maximum as our interest points. We do this by first using colfilt with a sliding window which computes the maximum value at each point based on the specified window size. The window size was also an important parameter I tweaked to get the best results possible. The smaller the window the more points we consider after performing our non-maximum supression. However, a smaller window also leads to farther away points having less influence on whether or not a point should be considered a maximum. I found that a too small window sized performed very well on Notre Dame but ended up overfitting, performing poorly on the other images. A window size of [3,3] gave me 93% accuracy on Notre Dame, but sub 40% on Mount Rushmore. By making my window size [5,5], Notre Dame decreased to 90% but Mount Rushmore increased to 62%! When performing non-maximum supression I both suppressed values that were not the maximum in the specified window but also included a threshold to only consider points above a certain score based on our Harris interest measure. I determined the threshold of 0.03 through examining my resulting points and by determining how many total points I wanted to sample. Lower thresholds gave us too many points, and higher thresholds would remove points that had the potential for high confidence matches. The code is below:


local_maxes = colfilt(harris_interest_values, [5,5], 'sliding', @max);

% get rid of all values that are not the local max and above our threshold!
non_maximum_suppr = harris_interest_values .*
	(harris_interest_values == local_maxes & harris_interest_values >= 0.03);

% get all non zero coordinates
[ypoints, xpoints] = find(non_maximum_suppr);
x = xpoints;
y = ypoints;

Get Features Algorithm

For this portion of the project I implemented a SIFT-like local feature. Initially I started by implementing local, normalized patches, but then I quickly moved to a SIFT-like implementation for better accuracy. The first step in this implimentation was to get the gradient magnitude and direction for each pixel in our image. I then create a histogram for each pixel with 8 buckets. I then put the gradient magnitude in a bucket corresponding to its direction. Later these histograms will be accumulated for each interest point. For the get features portion of the code I performed some experimentation. For instance I tried using a sobel filter rotated at 8 orientations to create my features, however, I found that the method I used here with gradient direction and magnitude performed best. The code is below:


[gMag, gDir] = imgradient(image);
imageHists = zeros(size(image));
for i = 1:size(image,1)
    for j = 1:size(image,2)
        dirVal = gDir(i, j);
        magVal = gMag(i, j);
        hist = [];
        if (dirVal >= -180 && dirVal < -135)
            hist = [magVal 0 0 0 0 0 0 0];
        elseif (dirVal >= -135 && dirVal < -90)
            hist = [0 magVal 0 0 0 0 0 0];
        elseif (dirVal >= -90 && dirVal < -45)
            hist = [0 0 magVal 0 0 0 0 0];
        elseif (dirVal >= -45 && dirVal < 0)
            hist = [0 0 0 magVal 0 0 0 0];
        elseif (dirVal >= 0 && dirVal < 45)
            hist = [0 0 0 0 magVal 0 0 0];
        elseif (dirVal >= 45 && dirVal < 90)
            hist = [0 0 0 0 0 magVal 0 0];
        elseif (dirVal >= 90 && dirVal < 135)
            hist = [0 0 0 0 0 0 magVal 0];
        else
            hist = [0 0 0 0 0 0 0 magVal];
        end
        imageHists(i, j, 1:8) = hist;
    end
end

Next, I iterated through my interest points and for each one, created a window around the interest point based on our feature width. I then split this window into 16 pieces. For each piece, I accumulated all the histogram magnitudes, and once this was complete for all the 16 pieces, I had 16 histograms of size 8, which then became my 1 by 128 sized feature! Then I normalized each feature. The code is below:


for point = 1:size(x, 1)
    xcoord = round(x(point));
    ycoord = round(y(point));
    half_width = feature_width / 2;
    if ~(xcoord - half_width > 0 && ycoord - half_width > 0 && xcoord + half_width + 1 <= size(image,2) && ycoord + half_width + 1 <= size(image,1))
        continue
    end
    temp_image = imageHists((ycoord-half_width):(ycoord+half_width-1), (xcoord-half_width):(xcoord+half_width-1),:);
    size(temp_image);
    results = zeros(4,4,8);
    for i = 1:feature_width/4:feature_width % 1 5 9 13 17
        for j = 1:feature_width/4:feature_width % 1 5 9 13 17
            tempResults = [0 0 0 0 0 0 0 0];
            for k = i:i+feature_width/4-1
                for q = j:j+feature_width/4-1
                    t = temp_image(k, q,:);
                    % the below line is genearlly transposed, however html views this as a comment
                    % it would look like tempResults = tempResults + t(:)';
                    tempResults = tempResults + t(:);
                end
            end
            results((i-1)/4+1,(j-1)/4+1, :) = tempResults;
        end
    end
    results = results(:);
    features(:,point) = results ./ norm(results);
end

Match Features Algorithm

The final part of this project was matching features. Here I implemented the nearest neighbor distance ratio test to match local features. For each feature in my first vector of features, I found the two closest feautres in my second vector of features based on Euclidean distance. Then, I scored this match based on the distance to the closest features divided by the distance to the second closest feature. This became a confidence score for the match of the feature from my first vector to the closest feature in the second vector. Finally, I sort by confidence so that in the project 2 master code we can only look at the top 100 matches. The code to perform this process is below:


matches = zeros(size(features1, 2), 2);
confidences = zeros(size(features1, 2), 1);

matchCounter = 1;
for feat1 = 1:size(features1, 2)
    minDistance1 = vnorm(features1(:,feat1) - features2(:,1));
    minDistance2 = vnorm(features1(:,feat1) - features2(:,2));
    index1 = feat1;
    index2 = 1;
    if (minDistance1 > minDistance2)
        temp = minDistance1;
        minDistance1 = minDistance2;
        minDistance2 = temp;
        index2 = 2;
    end
    for feat2 = 3:size(features2, 2)
        tempDist = vnorm(features1(:,feat1) - features2(:,feat2));
        if (tempDist < minDistance1)
            minDistance2 = minDistance1;
            minDistance1 = tempDist;
            index2 = feat2;
        elseif (tempDist < minDistance2)
            minDistance2 = tempDist;
        end
    end
    matches(matchCounter,:) = [index1 index2];
    confidences(matchCounter) = minDistance1 / minDistance2;
    matchCounter = matchCounter + 1;
end

% Sort the matches so that the most confident onces are at the top of the
% list.
[confidences, ind] = sort(confidences, 'ascend');
matches = matches(ind,:);

Examining Results:

Let's begin by looking at how well my matching method works on the three images with provided ground truth points. We can then look at how it performs on some other image pairs, and discuss how different modifications and design decisions effected the results.

identity
Notre Dame Evaluation (90%)
sobel
Notre Dame Match Arrows
sobel
Notre Dame Match Points
identity
Mount Rushmore Evaluation (62%)
sobel
Mount Rushmore Match Arrows
sobel
Mount Rushmore Match Points
identity
Episcopal Gaudi Evaluation (9%)
sobel
Episcopal Gaudi Match Arrows
sobel
Episcopal Gaudi Match Points

First we notice that on the whole the matching method does a good job of finding matches, especially for Notre Dame (90%) and Mount Rushmore (62%). I found that by choosing to use SIFT-like features instead of normalized patches, my accuracy increased on Notre Dame from 44% to 72% when using the cheat_interest_points method. Furthermore, once I implemented my own interest points method my accuracy improved from 72% to 93% on Notre Dame. That said, what I found was that my method was initially overfitting to the Notre Dame image. The sliding window that I was using was far too small, and therefore while it did a great job on the central portion of the Notre Dame image, allowing me to get many matches in this area, it did not generalize well to Mount Rushmore which had a 35% accuracy or to Episcopal Gaudi which had a 7% accuracy. By increasing the size of my sliding window I was able to grab more points that were truly local maxima, increasing Mount Rushmore by 27% and Episcopal Gaudi by 2%, while only losing 3% accuracy on Notre Dame. I also found that using non-maximum suppression greatly increased my acccuracy and quality of interest points as opposed to simply using some threshold value after finding Harris scores. Finally, I found that using imgradient to get gradient magnitude and direction performed best as opposed to using a rotated sobel filter, increasing my accuracy by 7% on the Notre Dame image. I also tried a few different bucketing schemes in my get_features method, separting the directions in different areas and even triyng slighly larger buckets for certain directions. I found the results were inconsistent accross images for these methods and I decided to stick with my original bucketing method of 45 degree buckets starting at -180 and going to 180 degrees. One final note on these images: we see for Episcopal Gaudi that the first image picks up a bird as a feature. I found that using a larger Gaussian helps in eliminating this false positive, but then does not work as well when matching the rest of the image.

Now lets look at some other images that we don't have ground truth points for.

sobel
Sleeping Beauty Castle Paris Match Arrows
sobel
Sleeping Beauty Castle Paris Match Points

Here we see absolutely phenomenal performance! In fact, by my eye test the matching procedure has matched all 100 points, where the points are relatively well spread out! The reason for this is first that both of these images have relatively clear backgrounds, making it easy for the background not be confused for features. Furthermore, the images are taken at relatively similar angles, making the orientation of each feature much easier to match. Regardless, the algorithm performed very well on this pair, but worse on different orientations and backgrounds of this image.

sobel
House Match Arrows
sobel
House Match Points

In this example we also see relatively good matches. A lot of the major features of the house are accurately matched! We see here that the tree boundaries also match up quite well for the most part. However, our first image detects many features at the top boundary of some of the trees, which then becomes very difficult to match in the second image which doesn't include this same boundary. We also see that the slightly different orientation makes it very difficult for the procedure to detect features at the boundary of the shrubs in front of the house and the house.

sobel
Capricho Gaudi Match Arrows
sobel
Capricho Gaudi Match Points

Here we see some of the poorest performance for our system. When it comes to matching the most obvious portions of the structure, the algorithm does a good job, matching the middle maroon point and the venter green point on the tower in the Capricho Gaudi. That said, the trees in both of these images provide a lot of false matches. These images are a quite different orientations and include various amounts of trees. We also see more features detected near the corners and edges of our image. We could consider a method for detecting trees to help remove these points as interest points and focus more on the structure itself. Furthermore, since the image pillar matches the background trees, it makes detecting many of these points more difficult.


Extra Credit

For local feature description I experimented with different SIFT parameters. I tried larger feauters of size 256 by using smaller buckets with a larger histogram (-180 to -157.5, etc.). I also tried only 4 buckets of orientations but dividing my image patches into 32 cells. I found that using 32 cells resulted in a far worse feature based on % accuracy on Notre Dame and Mount Rushmore. The same was true for my 256 sized feature, but the results weren't as bad as with 32 cells. This larger feature also caused slower performance as match_features took much longer to complete.

For feature matching I created a lower dimensional descriptor that still gave decent accuracy. To do this I created a 64 dimension feature by computing the square root of sum squared adjacent histogram buckets. This performed worse but not by too much, and allowed match_features to go much faster.