Project 3: Camera Calibration and Fundamental Matrix Estimation with RANSAC

Overview

The main goal of this project was to utilize the concept of camera and scene geometry to better match images. Specifically, a camera projection matrix was used to map world coordinates image coordinates. Then, a fundamental matrix was used to determine epipolar lines based on two sets of image points. Below is a summary of the three steps taken:

  • Estimate camera projection matrix based on ground truths
  • Estimate fundamental matrix based on ground truths
  • RANSAC method implementation to match SIFT features
  • After implementation of the above, matching of simple images was possible. Normalizing during calculation of the fundamental matrix allowed for significantly improved matching across all image pairs, and will be discussed later.

    Camera Projection Matrix

    When working with using camera data to determine real world angles, one must have a way to relate the two spaces. The camera projection matrix does this, converting the 3D world coordinates to 2D camera coordinates, allowing us to determine locations and angle changes between images. To complete this conversion, we must solve for a 4x3 matrix to relate the two, which can be done using a system of linear equations. After solving this equation set, we can determine various facts about the camera, one of which is the camera location. Info on how to calculate this location, along with my calculated projection matrix, can be found below.

    
    Projection Matrix, M:
        0.4583   -0.2947   -0.0140    0.0040
       -0.0509   -0.0546   -0.5411   -0.0524
        0.1090    0.1783   -0.0443    0.5968
    
    
    The total residual is: <0.0445>
    
    Estimated Camera Location: <-1.5127, -2.3517, 0.2826>
    Camera Location Equation: Location = -INVERSE(M(:,1:3)) * M(:,4)
    
    

    This projection matrix is the same as the negative of the given projection matrix, and the camera location is identical to the given location. A simple image demonstrating the strength of this projection matrix can be seen below.

    Image showing the correct positions (lines) and the estimated positions after using the projection matrix (circles). Note that all of these lines and circles match up.

    Fundamental Matrix

    The next step is to find epipolar lines between the two images, or lines that map to points in another image. To do this, we calculate the fundamental matrix for the two images. The fundamental matrix is a matrix of Rank 2 that describes epipolar lines for each pair of points. To calculate this matrix, we used the normalized 8-point algorithm described below:

  • Normalize the data (described in detail below)
  • Solve for f from Af = 0
  • Enforce rank-2 constraint (set bottom right corner of matrix to 0)
  • Transform matrix back to its original coordinates: F = transpose(T') * Fnorm * T
  • Both the second and third step are accomplished using singular value decomposition to solve systems of equations. The code below summarizes these steps.

    
    [T_a,a] = scaleData(a);
    [T_b,b] = scaleData(b);
    
    % Use the eight-point algorithm to compute F from the normalized points
    N = size(a,1);
    arr = zeros(N,9);
    for i = 1:N
        V = [a(i,:), 1]';
        U = [b(i,:), 1]';
        matrix = transpose(U * V');
        arr(i,:) = matrix(1:end);
    end
    [~,~,V] = svd(arr);
    F_matrix = transpose(reshape(V(:,end),3,3));
    
    % Enforce the rank-2 constraint
    % Take SVD of F and throw out the smallest singular value
    [U,S,V] = svd(F_matrix);
    S(3,3) = 0;
    F_matrix = U * S * V';
    
    % Transform fundamental matrix back to original units
    F_matrix = T_b' * F_matrix * T_a;
    
    

    Normalization for Fundamental Matrix

    To normalize the coordinates, two different transformations were applied. First, the average value of all coordinates was changed to 0 (shift transformation). Second, the squared mean distance of any poisition was scaled such that the max squared distance was 2. To accomplish these transformations, a simple helper function was implemented within my fundamental matrix calculation code. This basic helper function can be seen below:

    
    function [T, Points] = scaleData(Points)
    
    %Find Shift: ensure mean value is at origin
    meanCoordinates = mean(Points);
    offset_u = meanCoordinates(1);
    offset_v = meanCoordinates(2);
    
    Points = Points - ones(size(Points,1),1) * meanCoordinates;
    
    % Find Scale: ensure furthest point no more than is sqrt(2) from origin
    distanceList = sqrt(sum(Points.^2,2));
    scale = sqrt(2) / max(distanceList);
    
    %Calculate transformation matrices
    Tscale  = eye(3);
    Tscale([1 5]) = scale;
    Toffset = eye(3);
    Toffset(1,3) = -offset_u;
    Toffset(2,3) = -offset_v;
    T = Tscale * Toffset;
    
    %Apply scale to all points and separate coordinates
    Points = Points * scale;
    end
    
    

    RANSAC Method Implementation

    Finally, the work from the past two sections was used to filter SIFT feature matches for more complex photo pairs. For all images given, 300+ features can be found that are proposed as matches based on the basic algorithm from Project 2. However, many of these matches are inaccurate, either due to changes in intensity or obstruction of objects. To rectify this, the RANSAC method was used to solve for the best fundamental matrix for each image pair, then the best K inliers were kept.

    Random Sample Consensus (RANSAC) works via iteratively guessing parameters to fit a mathematical model, and judging the validity of the algorithm based on the number of inliers found. When fitting the fundamental matrix, eight random points were selected and a fundamental matrix was calculated, then the number of points that fit on epipolar lines was measured. The basic code for calculating inliers is given below:

    
    inliers_a = [];
    inliers_b = [];
    for i = 1:size(a,1)
        U = [b(i,:), 1];
        V = [a(i,:), 1];
        % U*F*V' should equal zero, allow slight tolerance
        if abs( U * F * V') < tolerance
             inliers_a = [inliers_a; a(i,:)];
             inliers_b = [inliers_b; b(i,:)];
        end
    end
    
    

    After either finding a certain percentage of inliers or running for 2000 iterations, the method was halted and the final epipolar lines were generated. Then, the top inliers were kept. The results of this method are summarized below in the results section.

    Results

    Rushmore Images: From 100 closest inliers for both images, all pairs were matched correctly. The epipolar lines found are also shown for reference.

    All SIFT matches
    100 closest inliers found from RANSAC algorithm
    Epipolar lines for both images. One can clearly see the similar lines between both images.


    Notre Dame Images: From 100 closest inliers for both images, about 8 pairs were mismatched, with four of these pairs having a point obscured in one of the two images.

    All SIFT matches
    100 closest inliers found from RANSAC algorithm


    Gaudi Images: For this image pair, two different methods were used. The first method used a standard fundamental matrix calculation. The second method used a normalized fundamental matrix calculation. The normalization produces far better results, with consistent epipolar lines and much more matches.

    All SIFT matches
    RANSAC algorithm resulting pairs, no normalization
    RANSAC algorithm resulting pairs, normalization of fundamental matrix