Project 2: Local Feature Matching

Harris Corner Detector

As a way of getting interest points of the figures, Harris corner detector is implemented with the formular of:

Process Flow

  1. Compute image gradient Ix and Iy with the Sobel operator.
  2. Compute Ix2, Iy2 and IxIy.
  3. Compute g(Ix2), g(Iy2) and g(IxIy) with a Gaussian filter.
  4. Threshold out points below certain score.
  5. Use BWCONNCOMP function in Matlab to group connected points together.
  6. Take the only the point with maximum value of score in each group.

Result - Interest Points


Interest points on Notre Dame images by Harris corner detector.

SIFT-like Local Feature

A figure below represents a schematic explanation of SIFT.

Process Flow

  1. Create a 16x16 window around each keypoint.
  2. Compute the gradient magnitude and direction for each window.
  3. Weight the magnitudes according to their distance from the center pixel by using a gaussian filter.
  4. Break the window up into 16 cells of size 4x4.
  5. Form a histogram for each cell by dividing the gradients into 8 buckets, each covering 45 degrees.
  6. Sum the weighted magnitudes of the gradients in each bucket.
  7. Append each cell's histogram to form the 1x128 feature descriptor vector.

Ratio Test Matching

Process Flow

  1. Set a threshold (roughly 0.7 ~ 0.8).
  2. Take the ratio of "d1 = first nearest distance" to "d2 = second nearest distance"
  3. Match if the ratio less than the threshold.
  4. Sort these matches based on the confidence.

Results


Notre Dame (top 100 points) : 93/100, 93% accuracy


Notre Dame (total 278 points) : 255/278, 92% accuracy


Mount Rushmore (top 100 points) : 96/100, 96% accuracy


Mount Rushmore (total 344 points) : 325/344, 94% accuracy


Episcopal Gaudi (total 69 points) : 59/69, 86% accuracy after scaling the 2nd image down by 0.6 (Before scaling : 25~40 % accuracy)


Statue of Liberty


Sleeping Beauty Castle Paris


House


Pantheon Paris (an example for scale effect : a small number of matching points)



Extra Credit : MSER (interest point detection)

Based on the paper by Matas [1], this technique checks for regions that remain stable over a certain number of thresholds. If a region Qi+Δ is not significantly larger than a region Qi-Δ, region Qi is taken as a maximally stable region. Interest point is determined as the center point of the maximally stable region over a certion threshold value so that we can choose the space on the objects. The stable region is assumed an ellipse in shape for ease of computation. Since I used the same local descriptor, the points near the image edges were suppressed. MSER takes apporoximately the center point of a stable region as an interest point. Therefore, the current feature descriptor which uses adjacent fixels may have a lower matching accuracy as shown below.

Results


Interest points with elliptical stable regions on Notre Dame images by MSER


Notre Dame via MSER: 19/26, 73% accuracy

[1] Matas, Jiri, et al. "Robust wide-baseline stereo from maximally stable extremal regions." Image and vision computing 22.10 (2004): 761-767.