Computer Vision Project

Project 2: Local Feature Matching

Overview

For this project, a local feature matching algorithm was implemented to determine if two unique images are "the same" as classified by a human. The algorithm can be broken up into three distinct steps:

Implement interest point detection (Harris Corner Detection)

Implement local feature detection (simplified SIFT Algorithm)

Implement feature matching algorithm (nearest-neighbor ratio test)

This basic implementation was able to successfully find 84% matching features in the simplest of image pairs, and performed moderately well on the medium difficulty image pairs.

Interest Point Detection

To detect interest points, the Harris corner detector was implemented (get_interest_points.m). The following steps are used to implement the corner detector:

Convolve image with derivatives of Gaussian filter

Suppress overinfluence of boundaries by zeroing edges

Compute the three images of products of gradients (Ix^2, Iy^2, Ix*Iy)

Convolve all image products with larger Gaussians

Compute the har metric

Suppress non-maxima. A sliding window was used to ensure all interest points were local maxima instead of individual maxima.

Local Feature Detection (simplified SIFT algorithm)

This function (get_features.m) takes in the interest points from the Harris corner detector implementation, and calculates a 128 dimension feature descriptor describing the local behavior at each interest point. The SIFT algorithm chose 128 dimensions as a balance between flexibility and overspecification: lower dimension descriptors perform poorly with changes in lighting, while higher dimension descriptors can become to specified.

The first step in the SIFT algorithm is to detect interest points, or keypoints. A basic Gaussian is convolved with the gradients of the image. Next, the orientation at each keypoint is calculated, along with the gradient magnitude of each keypoint. The code for this step is shown below:

octants = @(x,y) (ceil(atan2(y,x)/(pi/4)) + 4);
[row,col] = size(gx);
orients = zeros(row,col);
for i = 1:row
    for j = 1:col
        orients(i,j) = octants(gx(i,j), gy(i,j));
    end
end
mag = hypot(gx, gy);

Next, each feature is found based off of these keypoints and teir orientations. This part is where the 128 dimension feature descriptor is calculated. Subregions of the image are convolved with a new Gaussian with a sigma equal to half of the feature width. Then, the subregions are collapsed to obtain the 128 dimensions.

gaussianKey = fspecial('Gaussian', [feature_width feature_width], feature_width/2);
c = feature_width/4;
for i = 1:r
    %Weight magnitudes with Gaussian, sigma = 1/2 feature width
    xInd = (x(i) - 2*c): (x(i) + 2*c-1);
    yInd = (y(i) - 2*c): (y(i) + 2*c-1);
    fMag = mag(yInd, xInd).*gaussianKey;
    fOrients = orients(yInd, xInd);

    % Looping through each cell in the frame
    for xx = 0:3
        for yy = 0:3
            cOrients = fOrients(xx*4+1:xx*4+4, yy*4+1:yy*4+4);
            cMag = fMag(xx*4+1:xx*4+4, yy*4+1:yy*4+4);
            for k = 1:8
                val = sum(sum(cMag(cOrients==k)));
                features(i,(xx*32 + yy*8)+k) = val;
            end
        end
    end
end

Finally, the dimensions are normalized to a unit length.

Feature Matching

The final step was to match all interest points, and eliminate any spurious matches (match_features.m). To match interest points, the distance between all points was calculated, then the ratio of the nearest two neighbors to each point is calculated. This nearest-neighbor distance ratio (NNDR) test does a reasonable job of differentiating good matches from bad matches, thus reducing the number of false positives. The basic code to calculate the ratio can be seen below:

r = length(features1(:,1));
c = length(features2(:,1));
distances = zeros(r, 2);
distCalc = @(f1,f2,i,j) sqrt(-dot(f2(j,:)-f1(i,:),f1(i,:)-f2(j,:)));

%O(MN) Complexity
for i = 1:r
    for j = 1:c
        distances(i,j) = distCalc(features1, features2, i, j);
    end
end

[distances, ind] = sort(distances, 2);
confidences = distances(:,2) ./ distances(:,1); %ratio

% Determine which matches are accurate enough to keep
threshold = 1.43;
minConfidenceBound = confidences > threshold;
confidences = confidences(minConfidenceBound);

% Create output matches to be sorted
matches = zeros(size(confidences, 1), 2);
matches(:,1) = find(minConfidenceBound);
matches(:,2) = ind(minConfidenceBound, 1);

% Sort the matches so that the most confident onces are at the top.
[confidences, ind] = sort(confidences, 'descend');
matches = matches(ind,:);

Results

Notre Dame Images: 59 total good matches, 11 total bad matches. 84% accuracy

Mount Rushmore Images: 87 total good matches, 6 total bad matches. 94% accuracy