The aim here is to create a local feature matching pipeline, intended to work for instance level matching multiple views of the same physical scene.It involves the following 3 steps:
The method 'get_interest_points', computes the gradient of the input image along the x and y directions in order to compute the second moment matrix M. For a point to be a corner, both the eigen values of the second moment matrix must be large values. Therefore, we calculate the corner response function R, using alpha=0.06. We threshold the response of each pixel since corners or good interest points have larger response values i.e. cornerness.
Since we want to keep only the local maximas for the interest points, we perform Non-Maximum Suppression by sliding a window of 16X16 over the entire image, keeping the maximum value in the window and setting all other values to zero.
The second step is 'get_features'. The simplest, although not very accurate, method of generating the features is to take a normalised patch corresponding to each interest point. Here I took a normalised patch of 16X16 for the initial implementation of get_features. The final implementation of SIFT-like local feature descriptors includes computing the gradient at each pixel in a 1616 window around the detected interest point; for each 4X4 quadrant, a gradient orientation histogram is formed by adding the magnitude of gradient value to one of eight orientation histogram bins. In a step wise implementation, I first performed counter type binning method, simply counting the number of graidents in each bin. Then I took into consideration the magnitude of each of the 16X16 gradients. A 4X4X8 length flattened feature is thus obtained for each interest point.
The final step is 'match_features'. We must find the 'most' matching features among the set of features obtained from both/multiple input images. The method computes Eucledian distance between each pair of features using a double for loop, pdist2 method of matlab as well as using k-nearest neighbours for comparison. Then, the ratio of distances of the 2 closest features for each feature of image1 are computed (NNDR=nearest neighbor distance ratio). Confidence for each match is defined as the inverse of NNDR. We keep the 100 most confident matches (having least NNDR values).
%example code- get_interest_points
for i=feature_width/2:ymax-feature_width/2
for j=feature_width/2:xmax-feature_width/2
response(i,j)=(Ix2(i,j)+Iy2(i,j)-0.06*(Ixy(i,j).*Ixy(i,j)));
end
end
%example code- get_interest_points
for i=feature_width/2:ymax-feature_width/2
for j=feature_width/2:xmax-feature_width/2
response(i,j)=(Ix2(i,j)+Iy2(i,j)-0.06*(Ixy(i,j).*Ixy(i,j)));
end
end
%example code- get_features
[Gmag,Gdir] = imgradient(image);
xmin=floor(x(k)-(feature_width/2));
xmax=floor(x(k)+feature_width/2-1);
ymin=floor(y(k)-(feature_width/2));
ymax=floor(y(k)+feature_width/2-1);
temp_gmag=Gmag(ymin:ymax,xmin:xmax);
temp_gdir=Gdir(ymin:ymax,xmin:xmax);
%example code- match_features
[Sorted_Distance,Indices]=sort(Distance);
corresp_i=Indices(1:2,:);
matches_all(1,:)=1:size(features1,1);
matches_all(2,:)=corresp_i(1,:);
Sacre Coeur:Accuracy=100%(approximately)
Sleeping Beauty Castle Paris:Accuracy=92%(approximately)
Notre Dame: Accuracy=81%
Rushmore: Accuracy=78%
Accuracy using Normalized patches: 57%
Accuracy using SIFT like features: 77%
Accuracy using SIFT like features & normalizing and clamping features: 81%
Result with Normalized Patches
Result with SIFT like features & normalizing and clamping features
Threshold=0.0075: Accuracy= 80%
Threshold=0.0009: Accuracy= 80%
Threshold=0.001 : Accuracy= 81%
Threshold=0.0011: Accuracy= 79%
The aim here is to reduce the dimentionality of feature lower dimensional descriptor. To reduce the running time in match_features(), the following steps are required:
Result with dimension reduction using PCA
The aim here is to do space partitioning data structure(kdtree) to approximate nearest neighbor of feature to accelerate matching. The following steps are included:
Runtime variation