Project 2: Local Feature Matching

The top 100 most confident local feature matches for Mt. Rushmore image. In this case the accuracy was 99%. 99 were correct (highlighted in green) and 1 was incorrect (highlighted in red)

The implementation contains 3 main functions:

  1. get_interest_points()
  2. get_features()
  3. match_features()

The following is a detailed description of the implementation for each of the above functions. I have also mentioned the analysis after using different techniques in each of the functions.

For extra credit, I have implemented PCA, knnsearch and have given an analysis after varying different SIFT parameters. I have mentioned the results of these extra credit implementations in brief, in the description below. I have also explained the implementation of the extra credit part, later in the report.

get_interest_points()

This function implements the Harris corner detector as described in the lecture slides. I used the imgradientxy() function to compute the gradients. Then I applied gaussian filter to each of the computed matrices. I have shown the code for the next steps below.


har = (f_Gx_sq .* f_Gy_sq - f_Gxy.^2) - alpha_val * ((f_Gx_sq + f_Gy_sq).^2);
%Using ordfilt2 function to get local maxima
har_ord = har > ordfilt2(har,8,[ 1,1,1 ; 1,0,1 ; 1,1,1 ]);

%setting points less than threshold to 0
har_ord(har < 0.01 * graythresh(har)) = 0;

%Removing Boundary Points
b_p = feature_width * 2;
har(1:b_p, :) = 0;
har(end-b_p:end,:) = 0;

har(:,1:b_p) = 0;
har(:,end-b_p:end) = 0;

As seen in the above script, I have used the ordfilt2() function to find the local maxima, in a 3 x 3 window. Once all the local maxima are found, all the key points having value less than a cretain threshold are filtered out, and the remaining points are returned as key interest points. I tried different threshiold values and a multiplier of 0.01 gives the highest accuracy after matching features. All the boundary points are set to 0 as well.

Interest points for Notre Dame image.

get_features()

This function is an implementation of a SIFT-like feature. Following is the code snippet from the implementation.


        [graded_magnitude, graded_direction] = imgradient(padded_image); 
	step = round(feature_width )/ 4;

	num_of_bins = 12;
        for j = 1:step:size_m
            for k = 1:step:size_n
                mag_sub_grid = mag_sub_image(j:j+step-1,k:k+step-1);
                dir_sub_grid = dir_sub_image(j:j+step-1,k:k+step-1);
                flat_mag_sub_grid = mag_sub_grid(:);
                flat_dir_sub_grid = dir_sub_grid(:);
                size_sub_grid = size(flat_mag_sub_grid);
                bin = zeros(num_of_bins,1);
                for l = 1:size_sub_grid
                    bin_num = min(floor((flat_dir_sub_grid(l) + 180)/div_bins) + 1,num_of_bins);
                    bin(bin_num) = bin(bin_num) + flat_mag_sub_grid(l);
                end
                features(i,feature_index:feature_index + num_of_bins - 1)= bin(:);
                feature_index = feature_index + num_of_bins;
            end
        end
        normed_sub_image = features(i,:)/ norm(features(i,:),inf);
        features(i,:) = normed_sub_image(:);
	

The above snippet is a part of the entire implementation. Initially the entire image is padded with a size of feature_width/2. Then using imagradient() function from matlab I get the gradient magnitude and direction.

num_of_bins are the number of bins or the orientations in the SIFT implementation. Then in each of the divided sub grids (obtained by dividing the featurewidth by 4) each cell contributes to a particular orientation. Since we have angles from -180 to 180 we have to map it to 0 to 360 and get a bin number. Note : The min() function ensures that an extreme case when the orientation is 360 degrees is handled by the code.

This is done for all the features and then I normalize the features before returning them to the caller function.

match_features()

This function finds nearest neighbors to a given point in first feature assigns a confidence based on the ratio computation as shown in the snippet below. I have implemented this function both by brute force and using the knnsearch() function in Matlab. The code snippet below shoes the implmentation using brute force technique.


	num_features = size(features1, 1);
	to_match = size(features2,1);
	confidences = zeros(num_features,1);
	matches = zeros(num_features, 2);
			
	for i = 1 : num_features
		dist = zeros(to_match,2);
		for j = 1 : to_match
			dist(j,1) = sqrt(sum((features1(i,:)- features2(j,:)) .^ 2));
			dist(j,2) = j;
		end
		dist_sort = sortrows(dist,1);
		if dist_sort(2,1) ~= 0
			confidences(i) = dist_sort(1,1)/dist_sort(2,1);
		else
			confidences(i) = dist_sort(1,1);
		end
		matches(i,1) = i;
		matches(i,2) = dist_sort(1,2);
	end	
	

The above implementation is a brute force technique using two for loops. The confidence is computed using the Nearest Neighbor Distance Ratio as described in Szeliski 4.1.3 eq 4.18.

This also handles a rare case when the distance between the two points is zero. This function then returns the matches and the confidences.

Following are the output images after running the implementation on the images

Note: Some of the parameters used are : feature_width = 16, number of orientations = 8, threshold value for getting interest points = 0.01

Notre Dame
From top to bottom:
1) Dots visualization 2) Arrows indicating the matching of corresponding interest points 3) Evaluation for accuracy
Accuracy : 95%

Mount Rushmore
From top to bottom:
1) Dots visualization 2) Arrows indicating the matching of corresponding interest points 3) Evaluation for accuracy
Accuracy : 99%

Episcopal Gaudi
From top to bottom:
1) Dots visualization 2) Arrows indicating the matching of corresponding interest points 3) Evaluation for accuracy
Accuracy : 14%

Extra Credits

Experimenting SIFT parameters

The generic pattern observed on tuning different SIFT parameters are:
(Note: These observations are made for Notre Dame image)

1) Increasing the feature width, from 16 to 32 increases the accuracy from 95% to 99%. This is because instead of a previous 16x16 window getting divided into a 16 4x4 window, we now have 32 x 32 window divided into 16 8 x 8 windows. Thus we have a greater number of neighbors contributing to the histogram and hence we see an improvement in accuracy.
2) Reducing the number of subgrids, reduces the accuracy. When I changed the sub grid division from 16 sqaures of 4 x 4 to 4 sqaures of 8 x 8 the accuracy dropped from 95% to 90%. Whereas when I changed the number of sub grids to 64 sqaures of 2 x 2, the accuracy increased from 95% to 96%.
3) Increasing the number of orientations from 8 to 12 increases the accuracy from 95% to 97% whereas the accuracy dropped to 93% when the number of bins or orientations in histogram were reduced to 2.
4) When I changed the normalization from L 2 norm to L infinity the accuracy dropped from 95% to 92%. The accuracy remained the same for both L 1 and L 2 norm.

knnsearch() implementation

This function finds nearest neighbors to a given point in first feature assigns a confidence based on the ratio computation as shown in the snippet below. The following is an implementation using the knnsearch() function:


num_features = size(features1, 1);
matches = zeros(num_features, 2);
matches(:, 1) = 1:num_features;
[dist_index, dist] = knnsearch(features2, features1, 'k', 2, 'NSMethod', 'kdtree');
matches(:, 2) = dist_index(:, 1);
confidences = dist(:, 1) ./ dist(:, 2);
	

The time taken for the match_features() function to implement, with and without knnsearch:
with knnsearch: 8.312 sec
bruteforce : 56.327 sec
(Note : Brute force can be optimized slightly by using vectorization instead of iterating using a for loop)

pca() implementation

For PCA implementation, I used all the images provided in the extra_images folder. Reading every image and processing it in a similar fashion as given in proj2.m, I then get the interest_points for each image. Using these interest points I further compute the feature vector and then I randomly choose 1500 rows for every image and create an array with feature vectors for all the images. I then run a PCA on this matrix with a num_of_components = 16, 32 or 64 and then save the resulting coeff and mean matrices. Then I import these matrices into match_features() function, subtract the mean matrix from every row of features1 and features2 in this method and then later multiply both the features with coeff function before doing further computations in the function. The code snippet below is for the PCA implementation.


[m, n] = size(filenames);
num_points = 1500;
combined_image_features = zeros(m * num_points,128);
k = 1;

for i = 1:m
    image1 = imread(strcat('../data/Images/',filenames(i).name));
    %PreProcessing according to proj2.m
    image1 = single(image1)/255;
    scale_factor = 0.5; 
    image1 = imresize(image1, scale_factor, 'bilinear');
    image1_bw = rgb2gray(image1);
    feature_width = 16;
    [x1, y1] = get_interest_points(image1_bw, feature_width);
    [image1_features] = get_features(image1_bw, x1, y1, feature_width);
    [dim1, dim2] = size(image1_features);
    combined_image_features(k : k + num_points - 1,:) = image1_features(randi(dim1,1,num_points),:);
    disp(i)
end

[coeff,score,latent,tsquared,explained,mu] = pca(combined_image_features,'NumComponents',16);

save('PCA_metrics.mat','coeff','explained','mu');

%%This is from match_features.m
load PCA_metrics.mat coeff mu;

features1_mean = bsxfun(@minus,features1,mu); 
features2_mean = bsxfun(@minus,features2,mu);

features1 = features1_mean * coeff;
features2 = features2_mean * coeff;
	

The time taken for the match_features() function to implement, with and without PCA (this is the 'tic toc' time for the call made to match_features function ):
with PCA (16 components) : 1.386 sec accuracy : 88%
with PCA (32 components) : 2.797 sec accuracy : 90%
with PCA (64 components) : 4.892 sec accuracy : 95%
without PCA (size 128) : 7.985 sec accuracy : 95%
So PCA implementation does reduce the computation time at the expense of accuracy.