Project 2: Local Feature Matching

In this assignment our task was to implement a local feature matching pipeline from the ground up. Local feature matching is used in a wide variety of applications in computer vision such as object recognition and panorama stitching. In our work we want to implement a pipeline that will match points from two pictures. The points that we want to match correspond to matching points in the real-world scene. The difficulty is that we have two different pictures of the same scene that are taken from varying angles, with different lighting, contrast, and that may have different miscellaneous objects that might be present in one picture and not in the other such as people or vehicles. We can also notably have occlusion of the scene due to these objects.

The pipeline is as follows:

  1. Keypoint detection
  2. Local feature matching
  3. Feature matching

We have done the following work for this project:

  1. Harris corner detection with two different methods for non maxima suppression (connected components and sliding window. Extraction of keypoints with scale and orientation.
  2. Local 16x16 patch features. SIFT-like gradient features with 4 cells and 8 gradient orientations, 128 dimensional features. Features built at different scales and rotational invariance.
  3. Feature matching with ratio test.

Keypoint detection

We implemented the a simple Harris corner detection algorithm as described in p.214 of Szeliski's book (algorithm 4.1). In order to compute the non maxima suppression we first implemented the connected components method which can be observed in our code (commented, get_interest_points.m). The main problem of this method was that we had around thirty times more interest points in the second image of Gaudi Episcopal than in the first. The number of interest points of both pictures in Notre Dame were roughly the same. We then implemented the sliding window method which we maintained throughout the project because the amount of interest points in the second picture of Gaudi was only eight times higher than in the first. Accuracy was not affected by interchanging these methods but the second method is faster due to less comparisons between features.

We implemented scaled feature points by doing the following: Find the harris cornerness map for each scale (blurring with gaussians with variance sigma * mult_factor ^ scale) Compute the laplacian of the harris corner maps (we did this by filtering the harris corner maps one by one with a laplacian filter, we also did it by blurring the image two times with the appropriate variance of gaussian for the scale and then by substracting the 2 times blurred image from the 1 times blurred one). Find points that are maxima in the scale space (laplacian space, please see note at the end).

We found the orientation of the feature points by taking a 16x16 neighbourhood around the point, finding the gradient direction and magnitude of each pixel in this neighbourhood. Each of these 256 gradients votes for one of the 8 orientation bins with intensity equal to the gradient magnitude. The orientation bin that is has the biggest magnitude is the orientation of the feature point. This information is stored to be later passed on to the feature extraction function.

Feature extraction

For this part we executed the instructions to build a 16x16 local patch feature which can be found commented in our code.

We then implemented a SIFT-like feature that for each keypoint computes the magnitude and orientation of the gradients in a 16x16 neighbourhood around the keypoint. Each of these pixels votes in a histogram like we described previously only now it votes inside one of 4x4 cells. These 4x4 cells have in the end 8 histogram bins each (for each orientation). These 128 bins are concatenated to form our feature vector.

We implemented scale invariance by extracting the feature points at the appropriate scales (which are passed on to the function in an array). We blur the image to the appropriate scale and then extract the feature vector.

To implement rotation invariance, for each point we extracted a 24x24 patch around the point, rotated the patch to the correct orientation and then extracted the 16x16 patch in the middle.

Feature Matching

We implemented the ratio test matching which ranks the matches according to the ratio of the nearest neighbour and the second nearest neighbour which improved the match accuracy from a naive nearest neighbour matching algorithm.

Performance

After fine-tuning the blur parameters and the cornerness function threshold and parameters we achieved high accuracy on the first two pair of pictures.

Obtained 92% accuracy for 100 matches. 98% accuracy.

We did not obtain good results for Gaudi.

3% accuracy.

It is hard to conduct a rigorous study with this pipeline because when a parameter is modified the number of interest points detected can increase or decrease. We observed that below 500 interest points accuracy decreases by tens of percentage points. During our trials we did our best to stay within the 2000 interest point range.

One of the bigger issues is that neither the rotation invariance nor the scale invariance improved the performance of the algorithm on the Gaudi sample which should benefit from these addons, especially from scale invariance because the two pictures are not the same scale.

This is curious because we found that the algorithm was indeed extracting features from different scales and different orientations.

Even after long hours modifying parameters (variance of the gaussians, number of scales) and double checking the code we did not achieve a breakthrough.

5% accuracy

Note: We tried finding the maximum in the space spanned by the scaled harris maps convolved with a laplacian filter. This didn't work so maybe we did not understand what they meant by scale-space: so we tried finding the maximum in the space spanned by the scaled image (obtained by blurring) convolved with a laplacian filter. This did give better results on Gaudi either. Both methods give good results on the other images.