Project 2: Local Feature Matching

Notre Dame with interest points

In project 2, we need to implement a local feature matching algorithm, of which scale-invaraiant feature transform (SIFT) as the core pipeline. It consists the following steps:

  1. Interest point detection
  2. Local feature description
  3. Feature matching

Algorithms I

In interest point detection, the derivatives of both dimensions of the image is computed and then they are filterd with a DOG(difference of Gaussian) filter. The second moment derivatives dx^2,dy^2,dxdy are filterd by another large Gaussian filter respectively. Corner response, which is critical in detecting interest points, is computed via the formula given in literature. Finally, non-maximum suppression is done using two different approaches, colfilt() and bwconncomp().

Code


% Difference of Gaussian filter
DOG = fspecial('Gaussian',2) - fspecial('Gaussian',1);
[m,n] = size(image);

% calculating derivatives
dx = [-1,0,1];
dy = [-1;0;1];
% f_x, f_y is the first moment derivatives of input image
f_x = conv2(image,dx,'same');
f_y = conv2(image,dy,'same');
% apply DOG filter
f_x = imfilter(f_x, DOG);
f_y = imfilter(f_y, DOG);

% calculating second moment derivatives
large_gaussian = fspecial('Gaussian',4);
f_xx = imfilter(f_x.^2, large_gaussian);
f_yy = imfilter(f_y.^2, large_gaussian);
f_xy = imfilter(f_x.*f_y, large_gaussian);
alpha = 0.04;
corner_response = f_xx.*f_yy-f_xy.^2-alpha*((f_xx+f_yy).^2);
% threshold for corner detection
threshold = 0.001;
thresholds = corner_response > threshold;

% two different approaches for non-maximum suppression: 0 means COLFILT
% while 1 means BWCONNCOMP
method = 0;
if method == 0
    % corner_response = corner_response .* thresholds;
    sliding_size = 3;
    comparison = colfilt(corner_response,[sliding_size,sliding_size],'sliding',@max);
    x=[];
    y=[];
    for i = 1:m
        for j = 1:n
            if (corner_response(i,j) == comparison(i,j) && (corner_response(i,j) > threshold))
                x = [x;j];
                y = [y;i];
            end
        end
    end
elseif method == 1
    components = bwconncomp(thresholds);
    width = components.ImageSize(1);
    num_objects = components.NumObjects;
    x = zeros(num_objects, 1);
    y = zeros(num_objects, 1);
    for i=1:(components.NumObjects)
        pids = components.PixelIdxList{i};
        pvals = corner_response(pids);
        [max_value, max_id] = max(pvals);
        x(i) = floor(pids(max_id)/ width);
        y(i) = mod(pids(max_id), width);
    end
else
    fprintf('method must be either 0 or 1!\n');
end

Algorithms II

In local feature description, a 16*16 patch and a 4*4 matrix, each having 8 bins is generated for every interest point detected in the previous step. After the bins are filled, the bins are reshaped into a 128-dimension vector, which is the vector representation of each interest point. Finally, we need to normalize the features and certain threshold should be applied to reduce sharp luminance difference.

Code


num_points = size(x,1);
features = zeros(num_points, 128);

gaussian = fspecial('Gaussian');
% filter the image first
im = imfilter(image,gaussian);
% get magnitude and direction
[mags, dirs] = imgradient(image);
mags = imfilter(mags, gaussian);
dirs = imfilter(dirs, gaussian);

[m,n] = size(image);
% constants definition
c_size = feature_width/4;
num_bins = 8;
N = c_size * ones(1,c_size);
% threshold for extracting features
threshold = 0.6;
% iterate through each interest point
for ii = 1:num_points
    tmp = zeros(c_size, c_size, num_bins);
    % beginning and ending of row
    s_row = y(ii,1) - 2*c_size;
    e_row = y(ii,1) + 2*c_size - 1;
    % beginning and ending of column
    s_col = x(ii,1) - 2*c_size;
    e_col = x(ii,1) + 2*c_size - 1;
    
    if (s_row >= 1 && e_row <= m && s_col >= 1 && e_col <= n)
        % divide 16*16 window into 4*4 quadrant
        dir_cells=mat2cell(dirs(s_row:e_row,s_col:e_col),N,N);
        mag_cells=mat2cell(mags(s_row:e_row,s_col:e_col),N,N);
        for k=1:c_size
            for l=1:c_size
                for r_id=1:c_size
                    for c_id=1:c_size
                        % calculate for 8 bins respectively
                        if (dir_cells{k,l}(r_id,c_id) >= 0)
                            tmp_id = 1 + floor(dir_cells{k,l}(r_id,c_id)/45);
                        else
                            tmp_id = 4 - floor(dir_cells{k,l}(r_id,c_id)/45);
                        end
                        tmp(k,l,tmp_id)=tmp(k,l,tmp_id)+mag_cells{k,l}(r_id,c_id);
                    end
                end
            end
        end
    end
    feature = reshape(tmp,[1,128]);
    % normalize
    feature = feature ./norm(feature);
    feature = feature .* (feature <= threshold);
    feature = feature ./norm(feature);
    % suggested in literature which can improve performance
    feature = feature.^0.5;
    features(ii,:) = feature;
end

Algorithms III

In feature matching, simply apply nearest Euclidean distance upon each local feature computed in the previous step. NNDR ratio should be set in order to get rid of some incorrect(with low confidence) matches.

Code


threshold = 0.7;
dists = pdist2(features1, features2, 'euclidean');
matches = [];
confidences = [];

for ii = 1:size(features1,1)
    [nearests,indices] = sort(dists(ii,:));
    NN1 = nearests(1);
    NN2 = nearests(2);
    ratio = NN1/NN2;
    
    if(ratio <= threshold)
        matches = [matches;ii,indices(1)];
        confidences = [confidences;1/ratio];
    end
end

[confidences, ind] = sort(confidences, 'descend');
matches = matches(ind,:);

Notre Dame

Three sets of images are produced by different parameters. Parameters of the first set is 0.001 corner response threshold with a sliding window size of 16, 0.6 feature vector threshold and 0.8 NNDR ratio. The result is 62 correct matches and 12 incorrect ones, with 84% accuracy. For the second set, I increase the NNDR ratio to 0.95 and reduce the sliding window size to 3 while fixing the other parameters, we can see there are a lot more interest points detected. However, many of them are incorrect, leading to 1412 correct and 1121 incorrect matches with only 56% accuracy. For the third set, I change the NNDR ratio to 0.7 while keep sliding window size to be 3, and we can see the performance is perfect, with 95 correct matches in total and 100% accuracy!

Mount Rushmore

The first set is produced by 0.001 corner response threshold, 0.2 feature vector threshold and 0.7 NNDR ratio. We can see although the accuracy is 100%, there are only 6 correct matches. Such few interest points do not make sense here. Driven by that, I increase the NNDR ratio to 0.85, and the second set of images gives 83 correct matches and 3 incorrect ones with 97% accuracy. If we want more interest points, we can lower the corner response threshold to 0.0005 and raise the feature vector threshold to 0.6, which yields 134 correct matches and 99% accuracy. Finally, I tried to adjust the corner response threshold back to 0.001 and we have 100% accuracy again with 84 interest points detected. So far, we can observe that this tuple of parameters works properly for both images.

Episcopal Gaudi

We can see previous parameters don't fit here. Even I use 0.00008 corner response threshold, 0.6 feature vector threshold and 0.9 NNDR ratio, only 6 matches are correct and 23% accuracy is way too low. By adjusting the parameters, increase in accuracy is still quite limited because this dataset is very difficult as scales, rotation, and luminance both differ significantly in the two images.