Project 2: Local Feature Matching

Notre Dame pair with 92.00% accuracy

In this project, we implemented local feature matching algorithm to match two images. For Notre Dame pair, I achieved the accuracy of 92%. The algorithm is divided into following three parts:

  1. Detect interest points
  2. Build local feature descriptor
  3. Match local features

Detect interest points

To detect interest points, we follow the algorithm of Harris corner detector.
  1. Compute image derivative Ix,Iy by using imgradientxy() function.
  2. Use Gaussian filter to get g(Ix^2), g(Iy^2) and g(IxIy)
  3. Calculate cornerness scores(har) for each image window by using g(Ix^2)g(Iy^2)-[g(IxIy)]^2-α[g(Ix^2)+g(Iy^2)]^2.
  4. Threshold the value computed in above step.
  5. Perform non-maximum suppression.
            Basically after calculating cornerness scores, I run through each point in the image to check if har value for that point is greater than threshold value, if point is not at the image boundary and if har value is the local maximum. If all three requirements are satisfied, I then add this point to x, y vectors.
            In step 3, I set the α value to 0.04 because I noticed that other α values tend to cause lower accuracy. When α=0.04, accuracy=92%, but when α=0.05, accuracy drops to 88%.
            In step 4, I set threshold value to 0.01. Higher threshold value gets fewer interest points and lower threshold value gets more interest points. In my implementation when threshold value equals to 0.01, reasonable amount of interest points are produced and matching accuracy is high.
            In step 5, I used colfilt function to run a max() operator on each sliding window.
 

Adaptive non-maximum suppression(ANMS)

Without using ANMS, interest points are denser at the regions with higher contrast. Using ANMS could make interest points distribute more evenly in the image. But in my final implementation, I didn't use ANMS because it didn't increase my accuracy but actually drops my accuracy to about 62%.

Below are the results for two non-maximum suppression ways, image on the right uses ANMS.       
 

Build local feature descriptor

            I implemented a SIFT-like local feature descriptor in this project. I first blurred the image with Gaussian filter to reduce the effect of noise. I then looped through each interest point, get neighbourhood of size 16*16 around that point. Then I divided that area to 16 cells each is a 4*4 array. I built the orientation histogram which contains 8 bins (0,pi/4,2pi/4,3pi/4, pi,5pi/4,6pi/4,7pi/4) for each 4*4 array. So every point has a histogram of total 128 dimensions.
            To decide which bin should the pixel vote to, I calculate the angle of that pixel first and choose the closest bin to it.
            After getting the descriptor for a point, I normalized that descriptor to unit length. And raise each element of the feature vector to power 0.6 to increase accuracy.
            In my implementation, I set the feature width to be 16 because with larger feature width, accuracy seems to drop. For example, when I set feature width to 32, accuracy is only 84%.
            By using SIFT instead of normalized patch, my accuracy increased from 36% to 72%. (Test with cheat_interest_point())
 

Match local features

To match local features, I used pdist2() function to get 2 nearest points in feature2 for every point in feature1. I then calculated the nearest neighbour ratio. Lower ratio indicates that this match is more likely to be ture. Since we only need 100 most confident matches, I sorted the confidence vector, which is obtained by 1/ratio, in descend order and get the top 100 results.
 
 

Results in a table

Notre Dame with 92% accuracy.

Mount Rushmore with 97% accuracy.

Episcopal Gaudi with 3% accuracy. Since I didn't implement detecting keypoints at multiple scale level, the matching performs really bad on images with different scale.