Project 2: Local Feature Matching

Algorithms

This project focuses on the programming of a feature matching application, which can be split into three main components: interest point detection, feature descriptors, and feature matching.

Interest Point Detection

For interest point detection, I attempted an implementation of the Harris corner detector. The image is first convolved using the x- and y-derivatives of a gaussian filter to produce the intensity arrays for those directions. For each pixel, we then find the second moment matrix based on a window function where the window is as large as our desired feature size. Each matrix produces a "cornerness" value for that pixel, and after finding all of these values, the value matrix is scanned with a sliding window to search for local maxima. Non-maxima information is filtered out, as well as those maxima below a given threshold, and what remains is the set of interest points for our features.

% Threshold value
thresh = 2.5;
sigma = 11/6;
alpha = 0.05;

% Create derivative filters
dx = flip(fspecial('prewitt'))'; dy = dx';
g = fspecial('gaussian', 6*sigma, sigma);
gx = imfilter(g, dx);
gy = imfilter(g, dy);
g2 = fspecial('gaussian', 12*sigma, 2*sigma);

% Find image intensity in x and y directions, and multiplication of both
Ix = imfilter(image, gx);
Iy = imfilter(image, gy);
Ix2 = imfilter(Ix.^2, g2);
Iy2 = imfilter(Iy.^2, g2);
Ixy = imfilter(Ix.*Iy, g2);

R = zeros(size(image));
% For each pixel in the image that does not lie within feature_width of border,
for r=feature_width/2 + 1:size(image,1)-feature_width/2
    for c=feature_width/2 + 1:size(image,2)-feature_width/2
        % Get the window of intensities centered on the current pixel
        rs = r-feature_width/2:r+feature_width/2-1;
        cs = c-feature_width/2:c+feature_width/2-1;
        Ixs = Ix2(rs, cs);
        Iys = Iy2(rs, cs);
        Ixys = Ixy(rs, cs);
        % Find the second moment matrix and the cornerness scores
        M = [sum(Ixs(:)), sum(Ixys(:)); sum(Ixys(:)), sum(Iys(:))];
        R(r,c) = det(M) - alpha*trace(M)^2;
    end
end

% Find local maxima in the image
mx = colfilt(R, [feature_width feature_width], 'sliding', @max);
S = (R==mx)&(R>thresh);

% Find and return the coordinates of the found interest points
[y, x] = find(S);

Feature Descriptors

The next step of the algorithm is to create the feature descriptors that represent the features located at each of the interest points found. For this step, a SIFT-feature descriptor algorithm was used. For each interest point, we take the feature_width-sized frame centered on the point and find the gradient magnitudes and orientations for that window. A Gaussian falloff is applied to the magnitudes such that those pixels closer to the interest point carry more weight in the descriptor. The feature is then split into 16 cells, each a fourth of the feature width per dimension. For each of these cells, the gradient directions are sorted into 8 bins based on orientation, with weight equivalent to their weighted magnitudes. These 16 sets of 8 bins become the 128-value descriptor for the feature. In order to improve the descriptor, the values are additionally normalized, clamped to a maximum threshold of 0.2, and normalized again. Each value is also taken to a power of 0.6.

% Initialize features array
num_bins = 8;
features = zeros(size(x,1), 16*num_bins);

% For each set of interest points,
for i=1:length(x)
    xi = round(x(i)); yi = round(y(i));
    % Find the feature_width square around the x-y pair
    A = image(yi-feature_width/2:yi+feature_width/2-1, xi-feature_width/2:xi+feature_width/2-1);
    % Apply Gaussian falloff to the square
    [Gmag, Gdir] = imgradient(A);
    B = imfilter(Gmag,fspecial('gaussian', feature_width, feature_width/2));
    % Analyze the grid in 16 feature_width/4-sized squares for histogram generation
    for j = 1:16
        r = mod(j-1, 4)*feature_width/4 + 1;
        c = floor((j-1)/4)*feature_width/4 + 1;
        rs = r:r+feature_width/4-1;
        cs = c:c+feature_width/4-1;
        Y = discretize(Gdir(rs, cs), -180:360/num_bins:180);
        C = B(rs, cs);
        for n=1:num_bins
            features(i, (j-1)*num_bins+n) = sum(sum(C(Y==n)));
        end
    end
    % Normalize, threshold, and normalize again
    features(i,:) = features(i,:)./sum(features(i,:));
    features(i, features(i,:)>0.2) = 0.2;
    features(i,:) = features(i,:)./sum(features(i,:));
    features(i,:) = features(i,:).^0.6;
end

Feature Matching

The final step is to match the feature descriptors against one another. This is implemented simply enough using a nearest-neighbor distance ratio test. For each feature from the first image, we check its Euclidean distance to every feature from the second feature to find its closest and second closest neighbor. The closest neighbor is considered its match, and its confidence score is based on the inverse of the ratio of the closest neighbor distance to the second-closest's distance. The top 100 matches are returned in the main body of the code.

% Get sizes of feature arrays, initialize matches and confidences
m = size(features1, 1);
n = size(features2, 1);
matches = zeros(m, 2);
confidences = zeros(m,1);

% For each feature i in image1,
for i = 1:m
    dist = zeros(n,1);
    for j = 1:n
        % Find distances between feature i and neighbors in image2
        dist(j) = sqrt(sum((features1(i,:) - features2(j,:)).^2));
    end
    % Sort distances in ascending order to get index of closest features
    [dist, indices] = sort(dist, 'ascend');
    matches(i, 1) = i;
    matches(i, 2) = indices(1);
    % Confidence is inverse of ratio of nearest distance/second-nearest
    confidences(i) = dist(2)/dist(1);
end

% Sort the matches so that the most confident onces are at the top of the list
[confidences, ind] = sort(confidences, 'descend');
matches = matches(ind,:);

Experimentation

Amid troubleshooting different parts of my code, I made various parameter changes in attempts to maximize the accuracy of the code. One such change was reducing the power the descriptor values are raised to from 0.9 to 0.6; this led to a 4% increase in accuracy. Similarly, I adjusted the sigma, threshold, and alpha parameters within my interest point detection code. I found an alpha of 0.05 to work better than 0.04, 0.06, and most other values inbetween, and sigma, while reasonably decent at values of 1 or 2, yielded the best results at 5/3 and 11/6. Threshold was the trickiest, for while reducing threshold does allow for more points to appear and increases the overall accuracy, it also led to outlier points appearing in the final images, so at the cost of accuracy, the threshold came to rest at a value of 2.5 to ensure a good set of interest points would be obtained.

Results

Overall, my code has proven to be far from perfect. Based on testing, my implementation of NNDR appears to be correct and produced the expected ~40% accuracy after enabling patchwork feature descriptors. I encountered issues programming the SIFT descriptors, however, mainly in not understanding how to correctly use trilinear interpolation, and my best efforts with the ground truths could only produce accuracies ranging from 50% to 60%. Interest points also gave me issues, for while the current version produces a reasonable set of interest points, it has not produced any significant increase in accuracy for the code as a whole; comparing it to the output from detectHarrisFeatures(), I'm lead to believe that it is still to some extent erronous. Most of these problems mainly boil down to poor allocation of time on my part, as well as to a large lack of understanding with how most of these components worked, for while I have worked with feature detection in a previous class, this was my first time writing the code for the Harris corner detector and the SIFT descriptors from scratch. Some of the other problems I encountered during the coding process were curious, however, if not frustrating. For instance, within the interest point code, one of my early attempts for creating the derivatives of the Gaussian filter involved using the gradient() function on my desired gradient. The final output, however, ended up producing either no interest points or produced a vast number but with seemingly no correlation to the actual image. I fiddled around with the function and Gaussian for a while but had no definitive improvements no matter what I tried. Eventually, I found better results by convolving the Gaussian with a standard Prewitt filter for the derivatives, despite both old and new derivatives seeming reasonably identical from visual inspection. I can't help but question if the function and in turn many of the other functions I use in my code are not as functional as they should be. Regardless of whether it is the language or the programmer who is at fault for the faulty code, I am at least somewhat happy to have gotten as high as 62% on my best attempts. It may not be the expected 90% or the target 80%, but given the hard time I had during this assignment, being able to produce something reasonable has left me satisfied.

Results for Notre Dame (best accuracy: 62%)

Features matched by the code

Color-coded evaluation of the feature matching's accuracy


Results for Mount Rushmore (best accuracy: 57%)

Features matched by the code

Color-coded evaluation of the feature matching's accuracy


Results for Episcopal Gaudi (best accuracy: 2%)

Features matched by the code

Color-coded evaluation of the feature matching's accuracy