Project 2: Local Feature Matching

In this project we attempted to use harris corner detection, a variant of SIFT, and feature matching based on nearest neighbor ratio. Interestingly, I had the biggest problems with the simplest part: feature matching cost me hours of my life that I will never be able to get back, however my trials also revealed something quite interesting: a 1 dimensional feature can get you over 50% accuracy in some cases. That was fairly surprising.

Pictures

Here are the pictures made from the sift pipeline, showing the feature matches.

And some highlights from the code. First, the harris corner detector. Given a basic harris matrix "har", this will do a few things, first it removes any key points from near the edges of the image. Second, it uses thresh_har to cut off weak signals, and corners is the result after non-maximal supression.

thresh_har = har .* (har > (mean2(har)));
corners = colfilt(thresh_har, [feature_width/2, feature_width/2], 'sliding', @max);
edge_mask = zeros(size(thresh_har));
edge_mask(fw+1:w-fw,fw+1:h-fw) = 1;
corners = (thresh_har == corners) .* thresh_har .* edge_mask;

Next up is the actual code for SIFT. This involves cutting out a 16x16 area around a given interest point, then dividing that 16x16 up into 16 4x4 squares and doing some fun histogramming on them. The most important part though is once feature is constructed, some strange things happen. First, the feature is put to the 1/3 power, and then the "normalize, threshhold, normalize" sequence is used. These lead to about a 10% accuracy improvement over solely normalization (87% to 98% on Notre Dame.

bincount = 8;
features = zeros(size(x, 1), 16*bincount);
[gmag, gdir] = imgradient(image);
fw = feature_width/2;
for i = 1:size(x, 1)
    feature = zeros(1, 16*bincount);

    % double transpose because #matlab

    mags = gmag(y(i)-fw:y(i)+fw-1,x(i)-fw:x(i)+fw-1);
    dirs = gdir(y(i)-fw:y(i)+fw-1,x(i)-fw:x(i)+fw-1);
    for xc = 0:3
        for yc = 0:3
            bins = zeros(1,bincount);
            ms = reshape(mags(xc*4+1:xc*4+4,yc*4+1:yc*4+4), 1, []);
            ds = reshape(dirs(xc*4+1:xc*4+4,yc*4+1:yc*4+4), 1, []);
            for n = 1:16
                pos = floor((ds(n) + 180)/(360/bincount));
                bins(pos + 1) = bins(pos + 1) + ms(n);
                % This should make the features rotation invariant, but it
                % doesn't actually help us at all.
                % [M, I] = max(bins);
                % bins = circshift(bins', 1 - I)';
            end
            feature(bincount*(xc*4+yc)+1:bincount*(xc*4+yc)+bincount) = bins;
        end
    end
    
    feature = feature.^.35;
    % normalize -> threshhold -> normalize
    feature = feature/norm(feature,2);
    feature(feature > .2) = .2;
    feature = feature/norm(feature,2);
    
    features(i,:) = feature;

The final, and most aggravating component of this assignemnt was the feature matching portion. For speed, I used a knn based approach instead of a list sorting or list traversing one. This was fine, however I introduced an error in my original implementation. First the correct code

[idx, D] = knnsearch(features1,features2, 'K', 2);
for i = 1:size(idx, 1)
    da = features2(i,:);
    db = features1(idx(i,1),:);
    dc = features1(idx(i,2),:);
    nndr = norm(db - da) / (norm(dc - da));
    confidences(i) = 1 / nndr;
    matches(i,:) = [idx(i,1), i];
end

And the version with an error:

[idx, D] = knnsearch(features1,features2, 'K', 2);
for i = 1:size(idx, 1)
    da = features2(i);
    db = features1(idx(i,1));
    dc = features1(idx(i,2));
    nndr = norm(db - da) / (norm(dc - da));
    confidences(i) = 1 / nndr;
    matches(i,:) = [idx(i,1), i];
end

da, db, and dc are scalars instead of vectors in this second version. Interestingly, this version still manages to get over 50% accuracy, despite only haveing a single dimension to inform the readings. This surprised me greatly, and this bug was difficult to debug because it hid well, both in terms of reasonable accuracy and being very difficult to see in the first place. Incidentally, it also meant that I completed the project fairly out of the intended order, and as a result comparisons between various stages were not possible, I went from about 1% to 50% to 98% accuracy as I fixed bugs in my implementation of match_features, but SIFT and Harris worked perfectly.

This implementation gets 98% on Notre Dame and Mt. Rushmore, and about 10% accuracy on Gaudi, because the features are not yet truly scale invariant, as I am not creating them at multiple levels in a gaussian pyramid.