Project 2: Local Feature Matching


Table of Contents:

Motivation:

Choosing interest points, feature detection, and feature matching are an important element in the field of computer vision. Features are used for image alignment such as a panoramic or composite mosaic, object recognition, 3 dimensional reconstruction from 2 dimensional images, motion tracking, robot navigation, and etc. To do this we need specific unique features that can be tracked and compared. Take a look at the image below and try and match the feature A,B,C,D,E, and F:


Image 1: Try to Match the Features

It is pretty difficult to match feature A and B. A could be anywhere in the sky and B could be anywhere on the right side of the building. There is no distinction of where exactly these features are. E and F are much easier to find, we see that they are the left and right corners of the left side of the building. C and D look like edges of the building, but it is unclear exactly where they are.

Corner Detection: Intuitively, corners are junctions of contors. It is a place where two edges meet, i.e. there will be a high gradient in two directions. They generally have more stable features over changes of viewpoint and have large variations in the neighborhood of the points in all directions. If we think of placing a box over the edge and move it in any direction, the image in the box will change. This makes distinction better because there is a greater amount of variation with a small shift. Corners are good features to match. If you take a look at the above image again, you will see that the corner pieces were easier to identify. We can classify image windows based on their gradient statistics:

Below we see how the window changes for the above categories when we shift it:

The corner pieces are great for interest points, so we will try and extract these points. To do so we can use a method called the Harris Corner Detector.

Harris Corner Detector: The Moracev corner detection algorithmwas one of the earliest corner detection algorithms. With this algorithm, each image pixel is tested to see if a corner is present by considering how self-similar a patch of the pixel is to other overlapping patches. To calculate the similarity measure, Moracev used the sum of squared differences (SSD) between the pixels of the two patches. The lower the number of the SSD the more the similarity. Harris and Stephens improved Moravec's corner detection algorithm by using a gradient formulation to detect response at any shift. The mathematics of their algorithm is as follows:

Feature Detection: Once we have found our interest points through a means such as Harris Corner Detection, we can extract these points to provide a feature description. This feature description will help us identify our interst point from image 1 to image 2. For our features to work they should follow a few guidelines:

There are many methods to compute feature descriptors for our interest points. We will examine the SIFT pipeline for our purposes of feature desctiption.

SIFT Pipeline: SIFT or Scale-invariant feature transform is an algorithm that we will use to detect and describe local features. David Lowe published this algorithm in 1999 in his paper Distinctive Image Features from Scale-Invariant Keypoints. After we have gathered our interest points, the SIFT feature descriptor aims to create a vector. Let us now examine the key ideals SIFT brings forward:

Once we have these features, we are ready to match them.

Image Matching: We now have extracted features from their descriptors from our two images, we now want to establish a mechanism to match these features between our two images. The easiest way to match two features is to set some fixed threshold that represents the maximum distance and return all matches that satisfy this threshold. Here we can use the Euclidean distance metric. However, this threshold is difficult to set and can be innacurate at times. Sometimes a correct match can have a larger distance than an incorrect match. A better strategy is to implement the nearest neighbor distance ratio.

Nearest Neighbor Distance Ratio: The nearest neighbor distance ratio (NNDR), or ratio test, finds the nearest neighbor to the feature descriptor and second nearest neighbor to the feature descriptor and divides the two. We can observe the diagram below and conclude with a formula:

From this diagram we compute the formula for the NNDR as:

$NNDR = \frac{d_1}{d_2}$, where $d_1$ is the nearest neighbor distance and $d_2$ is the second nearest neighbor distance.

We now have all the information to proceed and start implementing these features.

Steps and Strategy:

There are three major steps of a local feature matching algorithm:

Our implementation strategy is as follows:

As we progress from step to step we will discuss how the accuracy improved or worsened and provide some information as to why.

Implementation:

Our implementation is as follows:

cheat_interest_points(): This function cheats by generating interest points from known correspondences. It will only work for image pairs with known correspondences. In our data we have three of these types of images. The function was created by Dr. James Hays and is as follows:

function [x1, y1, x2, y2] = cheat_interest_points(eval_file, scale_factor)
load(eval_file)
x1 = x1 .* scale_factor;
y1 = y1 .* scale_factor;
x2 = x2 .* scale_factor;
y2 = y2 .* scale_factor;
end

The parameters for this function are eval_file and scale_factor: eval_file is the file path to the list of known correspondences and scale_factor is needed to map from the original image coordinates to the resolution being used for the current experiment. We output a vector $[x_1,y_1,x_2,y_2]$ where $x_1$ and $y_1$ are $n$ x $1$ vectors of $x$ and $y$ coordinates of interest points in the first image and $x_1'$ and $y_1'$ are $m$ x $1$ vectors of $x$ and $y$ coordinates of interest points in the second image. In this function, the eval_file is loaded and element wise multiplication is performed with each $x_1$,$y_1$,$x_1'$,$y_1'$ and the scale_factor. The vector $[x_1,y_1,x_2,y_2]$ is then returned.

Let's run proj2.m with cheat_interest_points(): and take a look at the output below (hover over image to enlarge):

Notre Dame


Dots


Arrows


Evaluation

There were 149 total matches, 1 good match, and 148 bad matches for the Notre Dame image with 1% accuracy.

Mount Rushmore


Dots


Arrows


Evaluation

There were 126 total matches, 2 good matches, and 124 bad matches for the Mount Rushmore image with 2% accuracy.

Episcopal Gaudi


Dots


Arrows


Evaluation

There were 147 total matches, 2 good matches, and 145 bad matches for the Episcopal Gaudi image with 1% accuracy.

As we can see, the accuracy is not very good when we use cheat_interests_points(). This is because the features are random and the matches are random in the match_features() function, so we can expect a near zero accuracy. We will implement a new match_features() function next.

match_features(): There are two ways that we can implement match features: one implementation with a threhsold and one implementation where we return the first 100 matches. Both methods will suffice and both have positives. We will implement the method where we return the first 100 matches, but will also include code that can be used to filter the matches by a threshold.

  • Parameters: The parameters of our match_features function will be features1 and features2 from images1 and images2.
  • (optional)Threshold: A variable, $t$ to hold our threshold. We will initially set $t=0.7$, and will experiment with this value later. The purpose of this threshold is to filter out bad matches from our confidences, which will be constructed soon.
  • Distance: Calculate the Euclidean distance between features1 and features2. We can do this easily with the matlab function pdist2 with the parameter 'euclidean' passed in along with features1 and features2. We could do this with the traditional two for loops, but it will be computationally expensive while pdist2 will be very cheap.
  • Sort and keep Indices: Sort the matrix that holds the euclidean distances by rows between the two features and keep the indices with them. To do this we can use the matlab function sort.
  • Nearest Neighbors: Create a variable called confidences that holds the nearest neighbors. The nearest neighbor will be the first column in the sorted matrix and the second nearest neighbor will be the second column in the sorted matrix. We can divide column 2 over column 1 or we can divide column 1 over column 2 and then compute $1$/confidences as our confidences later.
  • Indices: Get the indices that are less than the threshold. To do this we can use the find function in matlab.
  • Matches Vector: Create a vector called matches that has the number of rows equal to the number of indices and 2 columns. We can fill this vector with zeros.
  • Column 1: Column 1 in matches should be filled with the indices from the confidences filtered by the threshold.
  • Column 2: Column 2 in matches should be the indices of the sorted euclidean distance matrix of the indices from the filtered confidences.
  • Truncate confidences: We do not want the length of our confidences to be larger than the length of our matches so we let confidences be equal to confidences of the indices from the filtered confidences by threshold so that confidences is simply the confidences that are less than the threshold.
  • Descending Sort: Sort confidences and its indices in descending order.
  • Matches: Let matches be equal to matches where the rows are the sorted indices above with all columns.
  • The code for returning the first 100 matches is as follows:

    
    function [matches, confidences] = match_features(features1, features2)
    D = pdist2(features1, features2, 'euclidean');
    [B,I] = sort(D,2);
    nearest_neighbor = B(:,1);
    second_nearest_neighbor = B(:,2);
    confidences = nearest_neighbor ./ second_nearest_neighbor;
    i = find(confidences)
    s = size(i);
    matches = zeros(s(1),2);
    matches(:,1) = i; 
    matches(:,2) = I(i);
    confidences = 1./confidences(i)
    [confidences, ind] = sort(confidences, 'descend');
    matches = matches(ind,:);
    matches = matches(1:100,:); 
    confidences = confidences(1:100,:)
    end
    

    The code for thresholding is as follows:

    
    function [matches, confidences] = match_features(features1, features2)
    t = 0.7;
    D = pdist2(features1, features2, 'euclidean');
    [B,I] = sort(D,2);
    nearest_neighbor = B(:,1);
    second_nearest_neighbor = B(:,2);
    confidences = nearest_neighbor ./ second_nearest_neighbor;
    i = find(confidences < t)
    s = size(i);
    matches = zeros(s(1),2);
    matches(:,1) = i; 
    matches(:,2) = I(i);
    confidences = 1./confidences(i)
    [confidences, ind] = sort(confidences, 'descend');
    matches = matches(ind,:);
    end	
    

    Let us make sure the function is working. We will let our features from get_features.m be some random points (this code is in get_features_cheat.m). Here we should expect the accuracy to still be very low. Let's take a look at the output below:

    Episcopal Gaudi: t=0.7


    Dots


    Arrows


    Evaluation

    There were 67 total matches, 2 good matches, and 65 bad matches for the Episcopal Gaudi image with 3% accuracy.

    Let us test this on an easier image below:

    Mount Rushmore: t=0.7


    Dots


    Arrows


    Evaluation

    There were 60 total matches, 1 good match, and 59 bad matches for the Mount Rushmore image with 2% accuracy.

    Let us now look at the accuracy with the function that returns 100 matches:

    Notre Dame: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 2 good matches, and 98 bad matches for the Notre Dame image with 2% accuracy.

    We see that these matches are not very good, even though we have implemented our match_features() function because we are currently using get_features() which was returning a random matrix of zeros (we changed this from a zeros vector to see some matches that would not occur at one single point). Let us implement a new version of get_features and see how our accuracy is with the match_features() function we have written:

    get_features() Part 1: What we now want to do is to cut out image patches from our image to use as features. Let us now discuss the approach to doing this:

    Below is the code for the steps above:

    
    function [features] = get_features(image, x, y, feature_width)
    f = feature_width/2;
    s = size(x);
    s(1);
    si = size(image); 
    image_rows = si(1);
    image_cols = si(2);
    features = [];
    for i=1:s(1)
            x_coord = uint16(x(i));
            y_coord = uint16(y(i));
            x_left = x_coord-f;
            x_right = x_coord+f-1;
            y_top = y_coord-f;
            y_bottom= y_coord+f-1;
            if x_left >= 1 & x_right <= image_rows-f & y_top >=1 & y_bottom <= image_cols-f
            im_square = image(y_top:y_bottom, x_left:x_right);
            feat_vec = reshape(im_square,[1, feature_width*feature_width]);
            features(i,:) = feat_vec;
            else 
                feat_vec = zeros(16,16)
                feat_vec = reshape(feat_vec,[1, 256]);
                features(i,:) = feat_vec;
            end
    end
    end
    

    Let's take a look at the output when we run proj2.m:

    Notre Dame t = 0.9


    Dots


    Arrows


    Evaluation

    There were 58 total matches, 36 good matches, and 22 bad matches for the Notre Dame image with 62% accuracy.

    Mount Rushmore t = 0.9


    Dots


    Arrows


    Evaluation

    There were 37 total matches, 16 good matches, and 21 bad matches for the Mount Rushmore image with 43% accuracy.

    Episcopal Gaudi t = 0.9


    Dots


    Arrows


    Evaluation

    There were 34 total matches, 5 good matches, and 29 bad matches for the Episcopal Gaudi image with 14.7% accuracy.

    Notre Dame: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 43 good matches, and 57 bad matches for the Notre Dame image with 43% accuracy.

    Mount Rushmore: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 23 good matches, and 77 bad matches for the Mount Rushmore image with 23% accuracy.

    Episcopal Gaudi: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 10 good matches, and 90 bad matches for the Episcopal Gaudi image with 10% accuracy.

    We see that we are getting much better accuracy, but it could still be improved. Let us take the next steps to improving our algorithm. Let us now revise get_features():

    get_features() Part 2: We will now finish our get_features() function by implementing a sift-like feature. The process for this implementation is as follows:

    The code for the above is as follows:

    
    function [features] = get_features(image, x, y, feature_width)
    f = feature_width/2;
    f1 = feature_width/4;
    s = size(x);
    si = size(image); 
    image_rows = si(1);
    image_cols = si(2);
    features = zeros(s(1),128);
    
    %Rotating the Sobel Filter 
    filter = [];
    m = fspecial('sobel'); 
    filter(:,:,1) = m;
    for i=2:8 
        m = [m(4) m(7) m(8); m(1) m(5) m(9); m(2) m(3) m(6)];
        filter(:,:,i) = m;
    end
    
    %Constructing New Image
    im_size = size(image);
    new_image = zeros(im_size(1),im_size(2),8); 
    for i=1:8 
        new_image(:,:,i) = imfilter(image,filter(:,:,i)); 
    end
    blur = fspecial('gaussian', [feature_width/2, feature_width/2], feature_width / 2);
    new_image = imfilter(new_image,blur);
    
    for i=1:s(1)
        x_coord = uint16(x(i));
        y_coord = uint16(y(i));     
        x_left = x_coord-f;
        x_right = x_coord+f-1;
        y_top = y_coord-f;
        y_bottom= y_coord+f-1;
        bins = zeros(1,128);
        if x_left >= 1 & x_right <= image_cols-f & y_top >=1 & y_bottom <= image_rows-f  
            for j=1:8
            	image = new_image(:,:,j); 
                patch = image(y_top:y_bottom, x_left:x_right);
                small_square = mat2cell(patch,[f1,f1,f1,f1], [f1,f1,f1,f1]);
                c = j;
                for row=1:4 
                    for col=1:4
                        cell = cell2mat(small_square(row,col));
                        val = sum(cell(:));
                        bins(:,c) = val;
                        c = c+8;
                    end
                end
                end 
            bins = bins./norm(bins);
            features(i,:) = bins(1,:); 
            end
        end 
    end
    

    Let us now examine the output with our newly improved get_features() function:

    Notre Dame: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 75 good matches, and 25 bad matches for the Notre Dame image with 75% accuracy.

    Mount Rushmore: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 41 good matches, and 59 bad matches for the Mount Rushmore image with 41% accuracy.

    Episcopal Gaudi: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 10 good matches, and 90 bad matches for the Episcopal Gaudi image with 10% accuracy.

    Notre Dame: t = 0.9


    Dots


    Arrows


    Evaluation

    There were 97 total matches, 72 good matches, and 25 bad matches for the Notre Dame image with 74% accuracy.

    Mount Rushmore: t = 0.9


    Dots


    Arrows


    Evaluation

    There were 76 total matches, 39 good matches, and 37 bad matches for the Mount Rushmore image with 51% accuracy.

    Episcopal Gaudi: t = 0.9


    Dots


    Arrows


    Evaluation

    There were 56 total matches, 8 good matches, and 48 bad matches for the Episcopal Gaudi image with 14% accuracy.

    The accuracy here is much better. We see that thresholding by 0.9 in general gives us higher accuracy than returning the 100 most confident matches. Up to this point we were still using cheat_interest_points() so let us implement a new get_interest_points() function to bump up our accuracy.

    get_interest_points(): We are now going to stop using cheat_interst_points() and will implement get_interest_points(). We will now implement a Harris Corner Detector. The steps for this implementation are as follows:

    The code is as follows:

    
    function [x, y, confidence, scale, orientation] = get_interest_points(image, feature_width)
    
    %Set Up: Create the Large and Small Gaussian and the gradient
    large_gaussian = fspecial('gaussian',9,2);
    small_gaussian = fspecial('gaussian',3,1);
    
    new_im = imfilter(image,small_gaussian);
    [Ix, Iy] = imgradientxy(new_im);
    
    
    %Useful Parameters for later
    s = size(image); 
    alpha = 0.06;
    
    %Calculate Ix, Iy, Ixx, Ixy, Iyy
    Ixy = Ix .* Iy;
    Ixy = imfilter(Ixy, large_gaussian); 
    Ixx = Ix.*Ix; 
    Ixx = imfilter(Ixx, large_gaussian); 
    Iyy = Iy.*Iy;
    Iyy = imfilter(Iyy, large_gaussian); 
    
    
    %Harris Value Calculation: h = det(A) - alpha*tr^2(A) = Ixx .* Iyy - Ixy.^2 - alpha .*(Ixx + Iyy) .* (Ixx + Iyy);
    %h = det(A) - alpha*trace(A).^2; 
    h = Ixx .* Iyy - Ixy.^2 - alpha .*(Ixx + Iyy) .* (Ixx + Iyy);
    
    
    %Add border to keep interest points or we can remove these corner points
    border = zeros(s);
    border(feature_width+1 : end - feature_width, feature_width +1: end - feature_width) = 1;
    
    %Threhsold
    h = h.*border;
    t = .0005;
    h = h.*(h>t);
    
    
    
    %NMS ON innner square
    B = colfilt(h,[feature_width/2, feature_width/2], 'sliding', @max);
    C = (h == B);
    h = C.*h;
    
    %V is confidence y is rows and x is columns
    [y, x, confidence] = find(h); 
    end
    

    We have now completed the entire pipeline for feature detection. Let us look at our final results:

    Results:

    Notre Dame: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 86 good matches, and 14 bad matches for the Notre Dame image with 86% accuracy.

    Mount Rushmore: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 96 good matches, and 4 bad matches for the Mount Rushmore image with 96% accuracy.

    Episcopal Gaudi: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 4 good matches, and 96 bad matches for the Episcopal Gaudi image with 4% accuracy.

    Notre Dame: t = 0.9


    Dots


    Arrows


    Evaluation

    There were 772 total matches, 304 good matches, and 468 bad matches for the Notre Dame image with 39% accuracy.

    Mount Rushmore: t = 0.9


    Dots


    Arrows


    Evaluation

    There were 952 total matches, 492 good matches, and 460 bad matches for the Mount Rushmore image with 52% accuracy.

    Episcopal Gaudi: t = 0.9


    Dots


    Arrows


    Evaluation

    There were 184 total matches, 7 good matches, and 177 bad matches for the Episcopal Gaudi image with 4% accuracy.

    The above illustrates how important the threshold is. We have set the threshold slightly high and it has let in hundreds of more matches. These matches were not filtered by confidence like the 100 matches were, and thus there were many points that were matched incorrectly. Let's take a look at Mount Rushmore with a lower threshold, say 0.7, and see how the accuracy plays out:

    Notre Dame: t = 0.7


    Dots


    Arrows


    Evaluation

    There were 214 total matches, 159 good matches, and 55 bad matches for the Notre Dame image with 74% accuracy.

    By changing the threshold by 0.2 we have gone from 39% to 74%.

    Let's take a look at a threshold of 0.6 on this same image:

    Notre Dame: t = 0.6


    Dots


    Arrows


    Evaluation

    There were 111 total matches, 93 good matches, and 18 bad matches for the Notre Dame image with 84% accuracy.

    Let us drop the threshold a little more and then make some conclusion:

    Notre Dame: t = 0.5


    Dots


    Arrows


    Evaluation

    There were 45 total matches, 42 good matches, and 3 bad matches for the Notre Dame image with 93% accuracy.

    By changing the threshold we have now achieved 93% on the Notre Dame image. We realize that as the threshold becomes lower and lower, the accuracy improves but the number of matched points decreases. We want to set a threshold value that gives us enough points and a good accuracy. We note that when using a resonable threshold of 0.6 we get better accuracy by returning the first 100 most confident matches.

    Bells & Whistles:

    Kd-Tree: A kd-tree is a space partitioning data structure (a spacial data structure) for organizing points in a $k$-dimensional space. At a high level, you can think of a kd-tree as a generalization of a binary search tree that stores points in a $k$-dimensional space. With a kd-tree all data stored in it must have the same dimension, for instance, you can't store points in a two-dimensional space in the same kd-tree that has four-dimensional points. The easiest way to understand this data structure is to look at an example:

    Each data point in the above kd-tree represents a three-dimensional point. We can divide this tree into a left subtree and a right subtree. If we look at every component in the left subtree we notice that the first element in the data point is less than then the first component of the root of the tree. If we look at the right subtree we notice that every first element in the data points have a value that is at least as large tas the first element in the root of the tree. If we look at the second element in each node, this pattern continues. We can use kd-trees for efficent querying. Say we want to see if point $P$ is stored in our kd-tree. We start at the root of the tree, if the root equals $P$ then we return the node. If the first component of $P$ is less than the first component of the root node, we look to at the left subtree and compare the second element of $P$. If the first element is greater than the root node we search through the right subtree and compare the second element of $P$. We keep continuing this process until we have found the node that is $P$. We then return this node.

    Now that we have some intuition about what a kd-tree is, we can talk about the nearest-neighbor lookup. With this we have a point in space and we want to know which point in the kd-tree is closest to our point. This closest point is called the nearest neighbor. Given a guess about which node is the nearest neighbor, we construct a hypersphere centered at our point that runs through our guess point. The nearest neighbor to our point must lie inside this hypersphere. If this hypersphere is fully to one side of a splitting hyperplane, then all points on the other side of the hyperplane cannot be in our sphere and thus cannot be the nearest neighbor.

    The algorithm for the nearest neighbor works as follows:

    The kd-tree algorithm has $O(logn)$ runtime, which can be very efficient to match our features. Matlab has build in functions to generate a kd-tree. Let us use these to construct a new match_features.m function called match_features_kd_tree.m

    The code is as follows:

    Implementation

    
    function [matches, confidences] = match_features_kd_tree(features1, features2)
    mdl = KDTreeSearcher(features2);
    [I,B] = knnsearch(mdl,features1,'K',2) ;
    confidences = B(:,1) ./ B(:,2); 
    i = find(confidences);
    s = size(i);
    matches = zeros(s(1),2);
    matches(:,1) = i; 
    matches(:,2) = I(i);                                                                       
    confidences = 1./confidences(i);
    [confidences, ind] = sort(confidences, 'descend');
    matches = matches(ind,:);
    matches = matches(1:100,:); 
    confidences = confidences(1:100,:);
    

    Results

    We can use the matlab functions tic and toc to time the speed of our function and compare that to the nearest neighbors method we previously wrote. We will also examine the output below:

    Notre Dame: 100 matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 86 good matches, and 14 bad matches for the Notre Dame image with 86% accuracy. The elapsed time of our function with the implemented kd-tree was 0.572821 seconds.

    Let us now take a look at our previous match_features function:

    Notre Dame: 100 Matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 86 good matches, and 14 bad matches for the Notre Dame image with 86% accuracy. The elapsed time of our function with the implemented match_features was 0.547863 seconds.

    We see that we have achieved the same accuracy with very similar speed. From above, we would think that knnsearch would be faster, however matlab's built in pdist2 function is optimized and works very quickly. Let us also note that we were using a small number of points, roughly 3,000. This is a very minor calculation for pdist2. However, pdist2 is an optimized method to calculate the euclidean distances between points in two vectors. It is much faster than computing the distances using a nested for loop.

    Let us create a matlab script called speedtest.m that will compare the speeds of pdist2, the nested for loop, and knnsearch. The program can be constructed as follows:

    
    num_points = 1000;
    features1 = rand(num_points,128); 
    features2 = rand(num_points,128); 
    
    %pdist2 Test
    disp('Pdist2 Results')
    tic
    D = pdist2(features1,features2,'euclidean'); 
    [B I] = sort(D,2);
    new_B = [];
    new_B(:,1) = B(:,1); 
    new_B(:,2) = B(:,2); 
    toc
    
    
    %for loop test
    disp('For loop Results')
    tic
    D = zeros(length(features1),length(features2));
    for i = 1:length(features1)
        for j = 1:length(features2)
            D(i,j) = norm(features1(i,:) - features2(j,:));
        end
    end
    [B,I] = sort(D,2); 
    new_B = [];
    new_B(:,1) = B(:,1); 
    new_B(:,2) = B(:,2); 
    toc
    
    
    %knnsearch test
    disp('Knnsearch Results')
    tic
    mdl = KDTreeSearcher(features2);
    [D, I] = knnsearch(mdl,features1,'K',2);
    I;
    toc
    

    Let us run the above code and examine the output for the variable num_points in the table below:

    num_points pdist2() loops knnsearch()
    1000 0.191885 s 1.934858 s 0.258514 s
    5000 6.113349 s 56.002508 s 8.700869 s
    10000 31.046381 s 251.661761 s 44.678936 s

    We can examine the speed for the case of num_points = 5000 in the chart below:

    We see that pdist2 is the fastest followed closely by knnsearch. The nested for loops approach however is much slower. We could have used the nested for loops method when we designed our match_features() function, however we chose pdist2 for its speed. We see that pdist2 is approximately 10 times faster than the nested for loops approach for a smaller number of elements and about 8 times faster than the nested loops for a larger amount of data. For a small amount of data knnsearch is about 7.5 times faster than the nested loops and is about 6 times faster for larger data sets. As our data gets larger and larger it is wiser to move to either pdist2 or knnsearch to compute the nearest neighbors. For small data sets, pdist2 is about 1.3 times faster than knnsearch and for larger data sets it is about 1.4 times faster. In either case we should choose pdist2 or knnsearch over nested for loops. We note that with knnsearch, we included the time it takes to build the tree (mdl = KDTreeSearcher(features2);). Excluding this time, the knnsearch is even closer to the speed of pdist2 and even faster in some cases where we have a huge amount of data.

    A kd-tree structure is a greate appraoch for computing the nearest neighbors and definitely speeds up the search time to match features.

    PCA: In computer vision, we deal with many high-dimensional data sets. These data sets are computationally expensive but we can use dimensionality reduction to derive a set of degrees of freedom which can be used to reproduce most of the variablity of a data set. This reproduced data set is more compact and is a low-dimensional encoding of the higher-dimesnional data set. This is specific to our case of feature matching. We were inputting features1 and features2 which were thousands of rows by 128 columns. We could definitely speed up our match features method by reducing the dimension of our data set. There are numerous schemes to try and approximate or accelerate dimensionality reduction or feature matching, one such method which is Principal Component Analysis (PCA).

    PCA Made Easy: Let us say we have a bunch of bottles of wine sitting on the table. Each bottle can be described by the strength of its taste, its age, its color, its bottle shape, etc.. For each bottle of wine there are many many characteristics. However, we might not need all characteristics to describe the bottle. PCA aims to describe these wine bottles with less characteristics. However, it does not just discard redundant characteristics. It creates new characteristics out of the old characteristics to summarize the bottles of wine. These new characteristics are linear combinations, so a new characteristic could be wine color minus wine age. PCA wants to find the best possible characteristics to summarize all bottles. In addition it wants the best combinations that will allow us to reconstruct the original wine features. If the newly created wine attribute can't be rebuilt to describe the wine bottles then it's not useful for us. Suppose we look at two characteristics, wine color on the $x$ axis and wine acidity on the $y$ axis. Assume that these are linearly correlated for the purpose of this example (although they probably are not). Let them be represented as blue dots where each blue dot represents the bottle of wine on the diagram below:

    Above the rotating line is attempting to find the line of best fit to the data and the red points are the projected blue points onto the line. When the line is straight through the data (almost 45 degrees) we see that we have maximum variance and minimum error. This is the line that will correspond the newly constructed feature from PCA. This is the first principal component. The direction of the first principal component is given by the first eigenvector of the covariance matrix.

    Mathematical Construction: PCA can be constructed as follows:

    Implementation: Let us now implement a new match features function called match_features_pca.m. To do this we need a basis vector that can be applied to both images. To get this we can run pca on about 100 images and compute this basis vector and then use this basic vector for features1 and features2. However, we can use Matlab's built-in pca function if we perform a small computation. If we compute one basis after running pca on a scale of images we will more likely learn a representation well suited to particular images. However this can be bad because SIFT features are fairly universal and if we change our basis for every pair of images we can no longer compare these features across large data sets. Additionally, computing this basis on a scale of images will take time. However, once we have this basis we will not need to run pca on our images. However, this might take longer than simply running the computation with the full vectors of features. Either approach is sufficient, and we will make use of matlab's built in pca functions for our implementation. We can't run pca on the two features vectors seperately because pca might not keep the same features in the reduced dimensionality. Our images are not identical so we must use another work around. What we can do is concatante features1 and features2 together and then run matlab's pcares on this concatantated matrix. We will specify our dimensionality in the reconstruced pca matrix and then return the columns from features1 into a respective features1 vector and the columns from features2 into a respective features2 vector. We can then use pdist2 on these two features with the euclidean parameter and compute the nearest neighbors like we did previously and return the first 100 most confident matches. The code is as follows:

    
    function [matches, confidences] = match_features_pca(features1, features2)
    num_feats = 64;
    C = vertcat(features1,features2); 
    [residuals,reconstructed] = pcares(C,num_feats);
    features1 = reconstructed(1:size(features1,1), 1:num_feats); 
    features2 = reconstructed(size(features1,1)+1:end, 1:num_feats);
    D = pdist2(features1, features2, 'euclidean');
    [B,I] = sort(D,2); 
    nearest_neighbor = B(:,1);
    second_nearest_neighbor = B(:,2);
    confidences = nearest_neighbor ./ second_nearest_neighbor;                
    i = find(confidences);
    s = size(i);
    matches = zeros(s(1),2);
    matches(:,1) = i; 
    matches(:,2) = I(i);
    confidences = 1./confidences(i);
    [confidences, ind] = sort(confidences, 'descend');
    matches = matches(ind,:);
    matches = matches(1:100,:); 
    confidences = confidences(1:100,:);
    end
    

    If we wanted to store the basis so that we do not have to recompute pca everytime on our features we can create the following function:

    
    function [Basis] = get_basis(features1, features2)
    C = vertcat(features1,features2); 
    [U S Basis] = svd(C); 
    csvwrite('Basis.dat',Basis);
    end
    

    We could then load this basis into match_features and do something like the following so that we do not have to compute the basis each time:

    
    function [matches, confidences] = match_features_basis_load(features1, features2)
    num_feats = 64;
    Basis = csvread('Basis.dat');
    C = vertcat(features1,features2);
    reconstructed = C * Basis; 
    features1 = reconstructed(1:size(features1,1), 1:num_feats); 
    features2 = reconstructed(size(features1,1)+1:end, 1:num_feats);
    D = pdist2(features1, features2, 'euclidean');
    [B,I] = sort(D,2); 
    nearest_neighbor = B(:,1);
    second_nearest_neighbor = B(:,2);
    confidences = nearest_neighbor ./ second_nearest_neighbor;                
    i = find(confidences);
    s = size(i);
    matches = zeros(s(1),2);
    matches(:,1) = i; 
    matches(:,2) = I(i);
    confidences = 1./confidences(i);
    [confidences, ind] = sort(confidences, 'descend');
    matches = matches(ind,:);
    matches = matches(1:100,:); 
    confidences = confidences(1:100,:);
    end
    

    This will save us a few seconds on each run of our program, but this is not a huge computational expense for this project, so we will let matlab do the work. However, make sure that the basis has been computed before running match_features_basis_load() so that the matches are accurate. The basis vector is currently saved as Basis.dat for the Mount Rushmore Image.

    Results:

    Our original dimension had 128 columns. Let us reduce this to 100 dimensions and display the accuracy for the mount rushmore image which previously had 128 dimensions and an accuracy of 96%:

    Mount Rushmore: 100 matches with PCA


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 95 good matches, and 5 bad matches for the Mount Rushmore image with 95% accuracy. By reducing the dimensionality from 128 to 100 we only lost 1% accuracy. Our match_features() function without pca took approximately 2.265843 seconds and our match_features_pca() function with 100 eigenspaces took 1.565106 seconds. We see that this is definitely quicker.

    Let us now reduce our dimensionality to 96 dimensions:

    Mount Rushmore: 100 matches with PCA


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 94 good matches, and 6 bad matches for the Mount Rushmore image with 94% accuracy. By reducing the dimensionality from 128 to 100 we only lost 2% accuracy. Our matching features method also took 1.512612 seconds which is roughly 2/3 the time it takes using 128 dimensions. This is the exact same that we get when we use match_features_basis_load except that match_features_basis_load is slightly quicker to use.

    Let us now reduce our dimensionality to 64 dimensions:

    Mount Rushmore: 100 matches with PCA


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 85 good matches, and 15 bad matches for the Mount Rushmore image with 85% accuracy. By reducing the dimensionality from 128 to 64 we only lost 11% accuracy. Our matching features method also took 1.147239 seconds which is roughly 1/2 the time it takes using 128 dimensions. This is the exact same that we get when we use match_features_basis_load except that match_features_basis_load is slightly quicker to use.

    Let us now reduce our dimensionality to 32 dimensions:

    Mount Rushmore: 100 matches with PCA


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 34 good matches, and 66 bad matches for the Mount Rushmore image with 34% accuracy. By reducing the dimensionality from 128 to 32 we only lost 62% accuracy. Our matching features method also took 0.857582 seconds which is roughly 1/4 the time it takes using 128 dimensions. This is the exact same that we get when we use match_features_basis_load except that match_features_basis_load is slightly quicker to use.

    We notice that as we keep less and less eigenspaces our descriptor loses more and more information and our accuracy may drop. We always want to prioritize accuracy, so we have a tradeoff of speed versus accuracy. In terms of the above image, reduction down to 64 dimensions reduced the time it took to match our features by 1/2 and dropped our accuracy only 11%. In terms of accuracy for our image this is very sufficient given the extra speed. However, I would not drop down to 32 dimensions for this image. The accuacy is not justified given the improvements in the speed for matching the features. If we run the Notre Dame image with 32 eigenspaces we will arrive at roughly 60% accuracy. This is much better but is still low on the accuracy scale for shaving off a few seconds during the matching phase. If it were say 75% or better it might be justifiable, but to me this is low. 80% would be acceptable for me to drop to this dimension. PCA dimensionality reduction is a great way to reduce the computational expense associated with our feature matching stage. We can comfortably reduce our dimensionality to 1/2 the original dimension and speed up our program without worring too much about a large drop in accuracy.

    Thresholding: Throughout the project we implemented both thresholding and 100 best matches so that we could compare the two methods. Please click here to see the results from the two methods. Based on our results we see that thresholding by a low value will let fewer points be matched but will generally provide higher accuracy than simply returning the most confident 100 matches. However, thresholding by a value that is too high will let a lot of points be matched, but may reduce the accuracy due to inaccurate matches coming from more points. Using the 100 most confident matches was a great approach and gave great results as did thresholding by reasonable values such as 0.6-0.7.

    Parameter Estimation: Let us experiment with some SIFT parameters. Let us first look at get_interest_points(). If we let in fewer features by making our corner score higher we will definitely speed up our program a great amount. We will spend less time in get_features() and match_features(). A reasonable amount of points was between 1,800 - 2,800 interest points. Tuning the frequencies, I found that a threshold value of $t=0.0005$ gave me this number of points. Using the mean value of the response, I was getting about 4,500 interest points, which added significant time to the running of the get_features() and match_features() methods. I also found that I achieved better accuracy with a threshold of 0.0005. I was getting high 90% on Mount Rushmore and mid 80% on Notre Dame.

    I also experimented with the feature_width parameter. In all the previous results, my feature_width was 16. I now doubled it in length and width to be 32. The results are as follows:

    Note Dame: 100 matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 75 good matches, and 25 bad matches for the Notre Dame image with 75% accuracy. This was about 11% lower than using a feature_width = 16. This makes sense because now our window is larger so we are getting pixels that might throw off our feature. For instance in the extra room from 16 x 16 to 32 x 32 we could get more of a flat surface in the picture which would be harder to match and lead to inaccuracies in matching.

    Below we use a feature_width = 8:

    Note Dame: 100 matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 69 good matches, and 31 bad matches for the Notre Dame image with 69% accuracy. This was about 17% lower than using a feature_width = 16. This could be attributed to not enough information being generated to depict a good feature for our interest point. The window could be too small and not able to encapsulate enough information.

    After experimenting with different feature widths, I found feature_width = 16 to give me the best results.

    Next, I changed the scale factor to 0.25 from 0.5 and got less interest points and the following results:

    Note Dame: 100 matches


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 72 good matches, and 18 bad matches for the Notre Dame image with 72% accuracy. This was about 14% lower than using a scale_factor = 0.5.

    Now, I changed the scale factor from 0.5 to 0.75. Here I got more interest points and the following results:

    Note Dame: 100 matches, PCA 100


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 91 good matches, and 9 bad matches for the Notre Dame image with 91% accuracy. This was about 10% higher than using a scale_factor = 0.5 with PCA 100. Scaling the image up definitely made a difference.

    I now changed the number of orientations of my sobel filters from 8 to 6. This will give us a feature vector that has 4 x 4 x 6 = 96 columns instead of 128 columns. This reduced dimensionality will make our program run faster, but let's look at the output below:

    Note Dame: 100 matches, PCA 100


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 84 good matches, and 16 bad matches for the Notre Dame image with 84% accuracy. This was about 3% higher than using 8 sobel orientations.

    Let us how half our orientations to 4 and examine the output:

    Note Dame: 100 matches, PCA 100


    Dots


    Arrows


    Evaluation

    There were 100 total matches, 86 good matches, and 14 bad matches for the Notre Dame image with 86% accuracy. This was 5% higher than using 8 sobel orientations with PCA with 100 eigenspaces.

    All in all, after experimenting with the numerous SIFT features, I found the best combination to be using a scaleing factor of 0.75, a feature_width of 16, 8 sobel orientations, and returning the 100 most confident matches for an accuracy of 91%.

    Conclusion: This project was great. I learned a lot about choosing proper interest points, creating useful features, dimensionality reduction, matching features, and kd-trees. I also realized how important proper threshold values were. If we set a value too low we can get very few matches, if we set it a little too high we can get terrible accuracy. The proccess of image feature matching is quite complicated and there are tons of options to optimize your program for speed, better accuracy, better features, and better interest points. I found the hardest step to be the implementeation of get_features. I had read about sobel filters and decided to use the 8 orientations of 45 degrees counterclockeise to compute the directions of the magnditudes of the gradients. This became tedius because using this method, the adding of the magnitude by orientation was being created vertically by direction instead of horizontally across all directions. Thus, I had to implement a small segment of code that would allow me to reshape these bins so that the directions stay consistent. All in all, I thought the algorithm performed very well with the Notre Dame image and the Mount Rushmore image. I was very suprised by the Mount Rushmore image. The accuracy was much better than I would have thought. Implementing pca and kd-trees was a good choice because I learned a lot doing this. PCA really does make a difference in speed and so do kd-trees. I definitely recommend using these methods during this project. By using sift-like features, harris corner detectors, and the nearest neighbors ratio my accuracy became better and better.

    References

  • Images
  • Content