Project 2: Local Feature Matching

The project focuses on a local feature matching algorithm that should work for images of a specific physical scene that are captured from multiple views. The algorithm can be divided into three major tasks:

  1. Obtaining interest points
  2. Obtaining a representative feature for each interest point
  3. Matching the features to match the interest points

Task 1: Harris Corner Detector

In this task, we should identify interest points that are repeatable, distinctive, and robust to occlusion, even when its orientation or view angle changes. Corners are good interest points because there is significant change in intensity in all directions, unlike a flat region that has no change in intensity in any direction or an edge that would have unchanged intensity parallel to the edge. The Harris Corner Detector was implemented to capture the corners in our image pairs. The algorithm started by convolving the original image with the derivatives of Gaussians to obtain the derivatives in horizontal and vertical directions (Ix and Iy). The Gaussians might need to be increased in its magnitude if the derivative images turn out to be too low in intensity. The outer products of these derivatives (gradients) are computed to form three images (IxIy, Ix^2, Iy^2), which would then be filtered with a bigger Gaussian with a sigma of 0.6. The cornerness value was calculated by implementing the following formula: g(Ix^2)g(Iy^2)-[g(IxIy)]^2 -alpha*[g(Ix^2)+g(Iy^2)]^2. Areas (16-pixel) along the four peripheral edges of the image have to suppressed to avoid mistakenly identifying sharp changes along the edges as the interest points. The last step of the algorithm for this task is to find the local maxima that are at the same time above a threshold value. The colfilt() MATLAB function was used so that a sliding window of 5x5 pixels sweeps across the entire image and determines if the cornerness value of a particular pixel is the local maximum. If it satisfies the local maximum requirement and it also above the set threshold, which can be adjusted via trials and errors, then it is determined as a corner interest point.

Some images from this task

       Image derivatives in x direction                 Image derivatives in y direction     Gaussian filter of the inner products of the derivatives: Left - g(Ix^2); Center - g(Iy^2); Right - g(IxIy)

Example of code


G = fspecial('Gaussian', [4 3], 1);
dG = diff(G)*20;
Ix = conv2(image, dG, 'same');      
Iy = conv2(image, dG', 'same');
Ix2 = Ix.^2;
IxIy = Ix.*Iy;
Iy2 = Iy.^2;    
sigma = 0.6;
GL = fspecial('Gaussian', [10 10], sigma);
gIx = conv2(Ix2,GL,'same');
gIxIy = conv2(IxIy,GL,'same');
gIy = conv2(Iy2,GL,'same');
alpha = .05;
corner = gIx.^2.*gIy.^2-gIxIy.^2-alpha.*((gIx.^2+gIy.^2).^2);

% suppress the features on the four edges to zero value 
row = size(corner,1)
col = size(corner,2)
corner(1:feature_width,:) = 0;
corner(row-feature_width:end,:) = 0;
corner(:,1:feature_width) = 0;
corner(:,col-feature_width:end) = 0;
    
%% local maxima and threshold   
thresh = 0.00000999;
lmax = colfilt(corner,[5 5],'sliding',@max);
coord = (corner ==lmax)&(cim>thresh); 
[y,x] = find(coord);  

Task 2: Obtaining a feature vector for each corner point

The simple way of defining a local feature for each interest point is by constructing a vector that is made up of normalized neighborhood pixels around it. This can be and was done using a 16x16 window around the interest point. The SIFT descriptor was also implemented as the supposedly better alternative to the local normalized feature. A 16x16 patch was cut out and "imgradient()" built-in MATLAB function was used to determine the gradient magnitude and direction of the pixels in the patch. The image gradient was weighted by a Gaussian to emphasize the effect of those pixels close to the interest point in the center of the patch. The patch was then divided into 16 pieces of 4x4 windows in which the gradient magnitudes of the 16 pixels were categorized into eight different bins based on their corresponding orientations. The built-in MATLAB function, histcounts(), was used to implement this step. The column vector of 8-element for each 4x4 window was then combined to form a 128-element descriptor vector.

Example of code


%% normalized patch as descriptor !!!!!!!!!!!!! Pick one
for k=1:1:length(x)
patch = image(y(k)-feature_width/2:y(k)+feature_width/2-1,...
             x(k)-feature_width/2:x(k)+feature_width/2-1);
feature(:,k) = reshape(patch,[256 1]);
features(:,k) = feature(:,k)/norm(feature(:,k));
end

%% SIFT descriptor !!!!!!!!!!!!! Pick one
for k=1:1:length(x)
patch = image(y(k)-feature_width/2:y(k)+feature_width/2-1,...
             x(k)-feature_width/2:x(k)+feature_width/2-1);
[mag, angle] = imgradient(patch);
G = fspecial('Gaussian', 9, sqrt(feature_width/2));
% Weigh magnitudes with gaussian
mag_weighted = imfilter(mag, G);
mag_Col = im2col(mag_weighted, [feature_width/4, feature_width/4], 'distinct');
angle_Col = im2col(angle, [feature_width/4, feature_width/4], 'distinct');
anglegap = 45;
angle_dir = -180:anglegap:180;
bin = zeros(length(angle_dir)-1,size(mag_Col,2));
for i = 1:1:size(mag_Col,2)
[N,edges,ind] = histcounts(angle_Col(:,i),angle_dir);
    for j = 1:1:size(mag_Col,1)
        bin(ind(j),i) = bin(ind(j),i)+mag_Col(j,i);
    end
end
vector = reshape(bin,[(length(angle_dir)-1)*size(mag_Col,2) 1]);
vector = vector / norm(vector);
features(:,k) = vector;
end

Task 3: Feature Matching

The distance between each feature from the image pair was computed by taking the norm of the feature descriptors. The ratio test was then implemented to determine the ratio between the first closest neighbor and the second closest neighbor distance and a threshold was set for the ratio. The feature pair that passes the threshold was considered a match with the confidence value of (1-ratio). The first 100 most confident feature pairs are displayed in the results below.

Results for Notre Dame and Mount Rushmore

                   Feature size of 16 pixels; Normalized patch; Accuracy: 84%                                        Feature size of 16 pixels; SIFT; Accuracy: 94%
                   Feature size of 16 pixels; Normalized patch; Accuracy: 80%                                        Feature size of 16 pixels; SIFT; Accuracy: 92%

Results for image pair with no ground truth

                                      Feature size of 16 pixels; Normalized patch                                                     Feature size of 16 pixels; SIFT

Additional results (i) - Change feature width

                    Feature width of 8 pixels; SIFT; Accuracy: 60%
                   Feature width of 32 pixels; SIFT; Accuracy: 98%

Additional results (ii) - Change feature width

                    Feature width of 64 pixels; SIFT; Accuracy: 96%                     Feature width of 128 pixels; SIFT; Accuracy: 80%

Discussion

For the Notre Dame image pair, the threshold I used for the cornerness function had to be increased to ~9.99x10^-7 in order to accept enough interest points and for the matching accuracy of the first 100 confident matches to reach 94%. Alpha in the cornerness function was tested and eventually settled at 0.06. Lower values only reduces the accuracy. For the Mount Rushmore image pair, the threshold I set for the cornerness function was one order of magnitude lower than that for the Notre Dame image pair. Any higher threshold led to too few matches or a match accuracy of lower than 80%. Alpha in the cornerness function also had to be reduced to 0.02 for the best results, compared to the Notre Dame image pair. For the pantheon paris image pair, there is no ground truth to compare my results with and hence a lack of accuracy evaluation. However, it can be observed that SIFT descriptor significantly improved the number of matching features between the image pair compared to the normalized patch descriptor.

From the lecture, I learned that a Gaussian filter can be used to blur the image to reduce the effect of high frequency edges before implementing the SIFT algorithm. However, it was not particularly useful for both the Notre Dame and Mount Rushmore image pairs. Therefore, the image blurring was not implemented in my get_features.m file. For both images, descriptors using the normalized patch led to worse performance than the SIFT descriptors. However, I was able to achieve above 80% matching accuracy for both image pairs using both normalized patch descriptor and SIFT descriptor.

From the additional results, we can see the effect of changing the feature width. For the Notre Dame image pair, feature width of 32 pixels provided the best accuracy (98%) among all the feature widths that I tested. I also changed the number of histogram bins (or gradient orientations) by changing the variable "anglegap" in my get_features.m file. The default value of eight (angle difference between bins of 45 degrees) was changed to six (angle difference between bins of 60 degrees) and twelve (angle difference between bins of 30 degrees), and not much improvement in the accuracy was observed.