The purpose of this project was to demonstrate local feature matching by implementing a simplified version of the SIFT pipeline. The pipeline consisted of 3 stages:
The SIFT-like local feature selection algorithm was chosen for step 2, and for matching the extracted features, I chose to use the 100 most confident matches (more details below).
This is the first stage of the pipeline, where we calculate the cornerness function using the derivatives of the images and a Gaussian filter. I used the colfilt() function to perform non-maximum suppression, and then found the x and y coordinates of the points I detect as interest points.
The steps I followed were as follows:
An observation I made here is that adjusting the threshold on the response image prior to non-max suppression made a great deal of difference to both the accuracy and the speed of the actual process. The reason for this was the number of interest points I eventually ended up with. Finding the right threshold value to give back an optimal number of interest points (for Notre Dame approximately 2500-3000 points) sped up the process quite a bit, while retaining (and in the case of Mount Rushmore, improving) the percentage of accuracy.
This is the second stage of the pipleline. Here, we are given the interest points that we detected from stage 1, and try to extract features to match. The steps I followed here were:
Code sample for creating feature vector:
feature_matrix = orientedImg(yCord - halfWidth : yCord + halfWidth - 1, xCord - halfWidth : xCord + halfWidth - 1);
cellArr = mat2cell(feature_matrix, [cSize, cSize, cSize, cSize], [cSize, cSize, cSize, cSize]);
for cellRow=1:4
for cellCol = 1:4
currCell = cell2mat(cellArr(cellRow,cellCol));
val = sum(sum(abs(currCell)));
featureVec(:,index) = val;
index = index+8;
end
end
The ratio test : given features from both images, I approached matching the brute-force way, where I found Euclidean distances between all pairs, and then using the closest two matches, I computed a ratio of nearest/second_nearest. The confidence for such a match was the inverse of this ratio. I tried thresholding the matches i.e picking ones under a certain threshold, but found that taking the top 100 most confident matches worked just as well (perhaps even better). Eventually, I sorted the confidences and took out the top 100 confident matches. This approach seemed to yield a good amount of accuracy (92% on the Notre Dame example).
Results of the third example were particularly poorer after implementing get_interest_points. The corner detector seems unable to detect all points of interest well enough to correspond to the ground truth points.
Additionally, to speed up feature matching, I used knnsearch to find the nearest neighbours rather than my earlier brute force distance calculation. This brought down time spent in match_features from 70.925 seconds to 32.635 seconds, which is more than double in speed reduction.
These are images taken from two floors of a high rise building of Midtown Atlanta. I did not have ground truth points to evaluate the correspondence, but just eyeballing the matches seemed to suggest that the algorithm works well on images taken from different angles, though maybe it would not have matched so many points if the images were scaled differently or the angles were too widely varied. Corners detected: Matches: