Project 2: Local Feature Matching

In this project, we implement a SIFT-like local feature matching algorithm to match points of interest between multiple views of an object. The pipeline for the algorithm is:

  1. Find interest points (here, corners).
  2. Extract features from these points (here, histogram of oriented gradients).
  3. Match points based on similarity of features (here, Euclidean distance).

Images of interest

The three pairs of images I use are shown below: Notre Dame, Mount Rushmore, and Episcopal Gaudi.

Algorithm

Finding interest points

I use the Harris detector algorithm to find corners. First, I find the image derivatives by filtering the image with a difference of Gaussians filter. Then, I form the second moment matrix by filtering a Gaussian with the squares of the image derivatives. Then, I find the cornerness score from this matrix. Finally, I find the interest points by thresholding the cornerness scores and finding the maximal points within each of the resulting connected components.

Extracting local features

For each interest point, I represent the local features by a histogram of the magnitudes of the gradients in eight directions. To find these magnitudes, I filter the image with eight different filters. Each filter is a difference of Gaussians with the difference aligned with one of the eight directions of interest. From there, I loop over each interest point. For each point, I form a 16-by-16 patch centered on the point and divide this patch into 4-by-4 subpatches. For each subpatch, I sum up the gradient magnitudes over each pixel, giving me an 8-dimensional vector of gradient magnitudes. I then concatenate these vectors from each subpatch to form a 128-dimensional vector which gets added to the feature matrix.

Matching points

I then loop over each feature from one image and find the closest one from the other in terms of Euclidean distance. I then find the confidence of each match by finding the ratio of the distance to the second best match to the distance of the chosen match (higher distance ratios corresponding to higher confidence). Finally, I sort the matches by decreasing confidence and return them.

Results

The results are visually shown in the images. Green and red outlines mean a correct and wrong correspondence, respectively. The accuracies from the top to bottom pairs are 89, 94, and 4 percent, respectively. The low accuracy on the last pair results from the scale difference between the two images, an obvious extension for this project.