CS 6476 - Project 2 by Ahmet Cecen

This script showcases self implemented local feature matching functions, and uses them on test cases to match image pairs with each other.

Part 1 - Baseline Implementation using SIFT-Like Features

These are the baseline local feature matching functions implemented following the prompts in the assigment exactly.

Harris Corner Detection using Algorithm 4.1 in the Szeliski book.
SIFT like features using gradient histograms.
Feature matching using ratio test using Eq. 4.18 in the Szeliski book.

Loading the Test Cases

image1 = imread('../data/Notre Dame/921919841_a30df938f2_o.jpg');
image2 = imread('../data/Notre Dame/4191453057_c86028ce1f_o.jpg');
eval_file = '../data/Notre Dame/921919841_a30df938f2_o_to_4191453057_c86028ce1f_o.mat';

image1 = single(image1)/255;
image2 = single(image2)/255;

scale_factor = 0.5;
image1 = imresize(image1, scale_factor, 'bilinear');
image2 = imresize(image2, scale_factor, 'bilinear');

image1_bw = rgb2gray(image1);
image2_bw = rgb2gray(image2);

Harris Corner Detection

Extracts corner keypoints without any scale or orientation information using gradient information.

Feature Description

Spatially bins the local neighbourhood using 4x4 patches in a 16x16 domain.

% Spatial Bins for Cells
[XX,YY] = meshgrid(-0.49:0.25:3.5);
splitMap = round(XX)+4*round(YY);
splitMap = splitMap*8;

imagesc(splitMap)
axis image

Bins the orientation space to 8 principal directions.

% Calculate Global Gradient
[Gx,Gy] = imgradientxy(image1_bw);

% Find Magnitude and Angle
magG = sqrt(Gx.^2 + Gy.^2);
phiG = atan2(Gy,Gx);

% Gradient Bins
phiBinEdges = -9/8*pi:pi/4:9/8*pi;
Y = discretize(phiG(:),phiBinEdges);
Y(Y == 9) = 1;

tempanglehist = accumarray(Y(:),magG(:),[],@sum);
polarhistogram('BinEdges',phiBinEdges(1:9),'BinCounts',tempanglehist);

Feature Matching

Finds the 2 nearest neighbors using MATLAB's built in KNN function, then uses nearest neighbor distance ratio test to return a match list ordered by confidence.

Results

Here I run the base framework for all 3 images and evaluate performance.

for testcase = 1:3

    tic;

switch testcase
case 1
% Notre Dame
            image1 = imread('../data/Notre Dame/921919841_a30df938f2_o.jpg');
            image2 = imread('../data/Notre Dame/4191453057_c86028ce1f_o.jpg');
            eval_file = '../data/Notre Dame/921919841_a30df938f2_o_to_4191453057_c86028ce1f_o.mat';

case 2
% Mount Rushmore
            image1 = imread('../data/Mount Rushmore/9021235130_7c2acd9554_o.jpg');
            image2 = imread('../data/Mount Rushmore/9318872612_a255c874fb_o.jpg');
            eval_file = '../data/Mount Rushmore/9021235130_7c2acd9554_o_to_9318872612_a255c874fb_o.mat';

case 3
% Episcopal Gaudi
            image1 = imread('../data/Episcopal Gaudi/4386465943_8cf9776378_o.jpg');
            image2 = imread('../data/Episcopal Gaudi/3743214471_1b5bbfda98_o.jpg');
            eval_file = '../data/Episcopal Gaudi/4386465943_8cf9776378_o_to_3743214471_1b5bbfda98_o.mat';

end
% Load, Cast as Single, Scale, Convert to Grayscale
    image1 = single(image1)/255;
    image2 = single(image2)/255;
    scale_factor = 0.5;
    image1 = imresize(image1, scale_factor, 'bilinear');
    image2 = imresize(image2, scale_factor, 'bilinear');
    image1_bw = rgb2gray(image1);
    image2_bw = rgb2gray(image2);

% Feature Window Size
    feature_width = 16;

% Execute Framework
% Extract Features from First Image
    [x1, y1] = get_interest_points(image1_bw, feature_width);
    [image1_features] = get_features(image1_bw, x1, y1, feature_width);

% Extract Features from Second Image
    [x2, y2] = get_interest_points(image2_bw, feature_width);
    [image2_features] = get_features(image2_bw, x2, y2, feature_width);

% Match the Features from the Pair
    [matches, confidences] = match_features(image1_features, image2_features);

% Evaluate Performance
    num_pts_to_evaluate = 100;
    figure;
    warning('off','images:initSize:adjustingMag');
    evaluate_correspondence(image1, image2, eval_file, scale_factor, ...
                            x1(matches(1:num_pts_to_evaluate,1)), ...
                            y1(matches(1:num_pts_to_evaluate,1)), ...
                            x2(matches(1:num_pts_to_evaluate,2)), ...
                            y2(matches(1:num_pts_to_evaluate,2)));

    toc;

end

91 total good matches, 9 total bad matches.  91.00% accuracy
Elapsed time is 1.170672 seconds.
93 total good matches, 7 total bad matches.  93.00% accuracy
Elapsed time is 1.937586 seconds.
5 total good matches, 95 total bad matches.  5.00% accuracy
Elapsed time is 1.290708 seconds.

Part 2 - Implementing Scaling and Orientation

Here I implement simple workarounds to make my features scale and orientation invariant, at least more so than before.

Scale Invariance

Many sources use complicated Gaussian pyramids to cast a very wide net to the scale space. For time and practicality purposes, here I instead decided to use a very simple Gaussian ladder: a 1 octave 4 level "pyramid" which should be able to accomodate the cases in this project.

I can simply wrap a for loop around the usual workflow for each scale. Then, I can scale the x and y coordinates so that I can point to locations in the original resolution.

Results - Scale

Notice that while there is minimal improvement for the first 2 cases, there is a gigantic leap of accuracy for the third case, since the third case involves an image pair with significant difference in scale.

92 total good matches, 8 total bad matches.  92.00% accuracy
Elapsed time is 1.289978 seconds.
97 total good matches, 3 total bad matches.  97.00% accuracy
Elapsed time is 2.300146 seconds.
62 total good matches, 38 total bad matches.  62.00% accuracy
Elapsed time is 1.430205 seconds.

Orientation Invariance

The book suggests that a stable method for this would be to bin the local neighbourhood of a keypoint during identification into an orientation histogram, and find a dominant direction. I have opted to implement a much simpler version here as well, and instead of the 36 bins in the source materials, I use 18 bins. I also find the dominant direction by simply finding the maximum histogram count.

The implementation of the feature descriptor is a bit more tricky. To do this fast, I use the built in imrotate function, and find the maximum size of the filter when I rotate a 16x16 matrix without cropping, which happens at 45 degrees.

Now I can pad and rotate the spatial and orientation bins in the feature space, while keeping the magnitude signal in the original frame.

% Rotation Mask
rotateMask = padarray(ones(16),[4 4],0,'post');
rotateMask = padarray(rotateMask,[3 3],0,'pre');

% Rotation Mask
splitMap = padarray(splitMap,[4 4],0,'post');
splitMap = padarray(splitMap,[3 3],0,'pre');

% Rotate
currentRotateMask = imrotate(rotateMask,60,'nearest','crop');
currentSplitMap = imrotate(splitMap,60,'nearest','crop');

Here is what the spatial binning looks like now for an example rotation:

imagesc(currentSplitMap)
axis image

I can then simple follow the same framework as before without any changes as the histogram can accomodate any orientation fairly.

Results - Orientation

While there is improvement in the last 2 cases, the first case seems to have lost accuracy as a result of rotation invariance. This is not surprising, as the church has many repeated patterns at different angles. The fact that the photos were taken from a similar viewpoint was coicidentally helping alleviate the disambiguity of these patterns, but rotational invariance dissolved this effect.

Also note that there some points in there that are incorrectly identified as non-matches. This is somewhat evident in the Notre-Dame image, but extremely evident in Mount Rushmore, as the misidentified point clearly corresponds to the same location in both pictures.

87 total good matches, 13 total bad matches.  87.00% accuracy
Elapsed time is 6.379788 seconds.
99 total good matches, 1 total bad matches.  99.00% accuracy
Elapsed time is 15.893006 seconds.
77 total good matches, 23 total bad matches.  77.00% accuracy
Elapsed time is 7.355163 seconds.

Published with MATLAB® R2016b