Project 3 / Camera Calibration and Fundamental Matrix Estimation with RANSAC

Example of epipolar lines.

The goal of this project is to introduce you to camera and scene geometry. Specifically we will estimate the camera projection matrix, which maps 3D world coordinates to image coordinates, as well as the fundamental matrix, which relates points in one scene to epipolar lines in another. The camera projection matrix and the fundamental matrix can each be estimated using point correspondences. To estimate the projection matrix (camera calibration), the input is corresponding 3d and 2d points. To estimate the fundamental matrix the input is corresponding 2d points across two images. The author will describe the implementations and their results in the following order:

  1. Estimate the projection matrix and camera location,
  2. Estimate the fundamental matrix, and
  3. Estimate the fundamental matrix using the point correspondences from SIFT.

Part 1: Estimate the projection matrix

The goal of this part is to estimate the projection matrix that converges a world cordinate to a camera cordinate as follows:

The author uses a homogeneous linear system (Ax = 0) to obtain the projection matrix. The x vector in this part consists of the 12 entries of the projection matrix as follows:

The world coordinates (Points_3D) and the camera coordinates (Points_2D) are given in this problem; thus, the above A matrix can be solved in two steps. First, build the A matrix above using the given n pairs of the world and camera coordinates. Second, solve the homogeneous linear system using the singular value decomposition. The codes are shown below:


% solve Ax = 0

% build A, see 13.pdf page 36
n = size(Points_3D, 1);
if(n ~= size(Points_2D, 1))
    disp('Sizes of Points 2D and 3D do not match');
end
A = zeros(2 * n, 12);
for row = 1:2 * n
    world_pos = Points_3D(ceil(row / 2), :);
    camera_pos = Points_2D(ceil(row / 2), :);
    u = camera_pos(1, 1);
    v = camera_pos(1, 2);
    if(mod(row, 2)) % odd row
        A(row, :) = [world_pos 1 0 0 0 0 -u * world_pos -u];
    else % even row
        A(row, :) = [0 0 0 0 world_pos 1 -v * world_pos -v];
    end
end

% see 13.pdf page 36
[U, S, V] = svd(A);
M = V(:, end);
M = reshape(M, [], 3)';

The computed projection matrix with the above method was

Note that the projection matrix is ambiguous up to a constant scale; thus, the projection matrix can not uniquely derived. However, it does not affect the derivation of the camera center. In order to evaluate the accuracy of the projection matrix, the projection pre-multiplies itself with the given world coordinates and compute the expected camera coordinates (u' and v'). The performance criterion, residual, is computed as follows:

The total residual with the given coordinates was 0.0445, and we can tell that the estimated projection matrix is good enough. Once the projection matrix is computed, computing the camera location is straightforward. Look at the slide 13.pdf page 45. The camera center can be computed as shown in the code below:

Derivation of a camera center. 13.pdf page 45.

% Computes a camera center from a projection matrix
function [Center] = compute_camera_center(M)
% see 13.pdf page 45
Q = M(1:3, 1:3);
m4 = M(:, 4);
Center = -inv(Q) * m4;
end

The computed camera center was <-1.5127, -2.3517, 0.2826>. Figures below summarizes the results of the part 1. The figure on the left hand side shows the projected 3D coordinates as well as the given 2D coordinates. Since all the projected points are located near the 2D points, we can tell the estimated projection matrix is computed correctly. The figure on the right hand side shows the relationship between the 2D and 3D coordinates and the camera location.

Projected 3D coordinates and the given 3D coordinates
3D view of the 2D and 3D locations, and the camera location
The table below summarizes the results:
camera center-1.5127-2.35170.2826
residual0.0445

Part 2: Estimate a fundamental matrix

TPart 2 works on the estimation of a fundamental matrich, which maps a point in one image to the line in the other image. The fundamental matrix is a 3 by 3 matrix; thus, we estimate 9 entries of the matrix. Since the fundamental matrix is not full ranked, we only need 8 pairs of the points. This means that we estimate the 9 parameters with 8 inputs and 1 constraint. The author uses a similar method to the one used in the Part 1 to estimate the entries in the target matrix. Two main differences to the Part 1 is

  1. Use of the normalized points to improve the poor numerical condition (graduate requirement) and
  2. Use of the constraint to force the fundamental matrix to be a rank 2 matrix.
First, we normalize the given pairs of the points to fit between -sqrt(2) to sqrt(2). This means that the mean squared distance between the origin and the data point is at most 2 pixels. This is done in two steps: shifting and scaling. All the points are shifted by their mean and scaled to fit to the specified range. The normalization is done with the following code:

% normalization
use_normalize = 1;
if(use_normalize)
    % naive implementation of the eight point algorithm has a poor numerical
    % condition. Use the normalized eight point algorithm. See the 14.pdf page
    % 40.

    % Center the image data at the origin, and scale it so the mean squared
    % distance between the origin and the data points is 2 pixels. -> which
    % means the points are restricted to -sqrt(2) ~ +sqrt(2)

    % The points are translated so that their centeroid is at the origin.
    % Piazza, instructor's response
    c = [mean(Points_a(:, 1)) mean(Points_a(:, 2))];
    c_prime = [mean(Points_b(:, 1)), mean(Points_b(:, 2))];
    Points_a_shift = [Points_a(:, 1) - c(1), Points_a(:, 2) - c(2)];
    Points_b_shift = [Points_b(:, 1) - c_prime(1), Points_b(:, 2) - c_prime(2)];
    m_shift = [1, 0, -c(1);
               0, 1, -c(2);
               0, 0, 1];
    m_shift_prime = [1, 0, -c_prime(1);
                     0, 1, -c_prime(2);
                     0, 0, 1];


    % compute mean squared distances
    msq = mean(Points_a_shift(:, 1) .^ 2 + Points_a_shift(:, 2) .^ 2);
    msq_prime = mean(Points_b_shift(:, 1) .^ 2 + Points_b_shift(:, 2) .^ 2);

    % compute scale s,
    ssq = 2/msq;
    ssq_prime = 2/msq_prime;
    s = sqrt(ssq);
    s_prime = sqrt(ssq_prime);
    m_scale = [s, 0, 0;
               0, s, 0;
               0, 0, 1];
    m_scale_prime = [s_prime, 0, 0;
                     0, s_prime, 0;
                     0, 0, 1];

    %[u', v', 1]^T = [s 0 0; 0 s 0; 0 0 1][1 0 -c_u; 0 1 -c_v; 0 0 1][u v 1]^T
    for i = 1:n
        tmp(i, :) = (m_scale * m_shift * [Points_a(i, :) 1]')';%'
        tmp_prime(i, :) = (m_scale_prime * m_shift_prime * [Points_b(i, :) 1]')';%'
    end
    Points_a = tmp(:, 1:2);
    Points_b = tmp_prime(:, 1:2);
end
You can enable/disable the normalization with the switch 'use_normalize'. These normalized points are used to build a A matrix, and the fundamental matrix is a solution of Ax = 0. The singularity constraint is achieved by letting the smallest eigenvalues to zero. Finally, the fundamental matrix is reconfigured to the original mean and scale by premultiplying and postmultiplying the shiting and scaling matrix. The following code does these operations:

% rename to the notations used in the slides.
u = Points_b(:, 1);
v = Points_b(:, 2);
u_prime = Points_a(:, 1);
v_prime = Points_a(:, 2);

% build an A matrix. See the slide 14.pdf page 33
A = [u .* u_prime, u .* v_prime, u, v .* u_prime, v .* v_prime, v,...
    u_prime, v_prime, ones(n, 1)];

% solve Af = 0, See the slide 14.pdf page 34
[U, S, V] = svd(A);
f = V(:, end);
F = reshape(f, [3 3])';%'

% use the singularity constraint, see the slide 14.pdf page 36
[U, S, V] = svd(F);
S(3, 3) = 0;
F = U * S * V';

if(use_normalize)
    F = (m_scale_prime * m_shift_prime)' * F * (m_scale * m_shift);
end
F_matrix = F;
The below tables and figures summarizes the results of the estimation with and without the nomalization. As we can see from the figures, the normalization improved the accuracy. (The epipolar lines got closer to the interest points with normalization.)
Fundamental matrix without normalization
F=-5.3626e-077.9036e-06-1.8860e-03
8.8354e-061.2132e-061.7233e-02
-9.0738e-04-2.6423e-029.9950e-01
Fundamental matrix with normalization
F=-1.1725e-07 1.6082e-06 -4.0198e-04
1.1121e-06 -2.7344e-07 3.2332e-03
-2.3640e-05 -4.4440e-03 1.0346e-01
Epipolar lines without normalization
Epipolar lines with normalization

Part 3: Estimate a fundamental matrix using SIFT and RANSAC

Part 3 derives the fundamental matrix in more realistic sense than Part 2. In Part 2, the corresponding pairs are given, and the fundamental matrix was derived based on these correspondence. However, the correspondence problem is one of the biggest challenges to find a fundamental matrix. Part 3 consists of solving the correspondence problem. Here, the author uses the vl_feat library (http://www.vlfeat.org/matlab/matlab.html) to extract interest points, compute SIFT features, and solve the correspondence problem. This is a fast library, but the accuracy of the matching is not very great. Therefore, the key part of Part 3 is to improve the matching quality using the RANSAC algorithm.

RANSAC (RANdom SAmple Consensus) consists of the three steps below:

  1. Sample (randomly) the number of points required to fit the model
  2. Solve for model parameters using samples
  3. Score by the fraction of inliers within a preset threshold of the model

For the derivation of the fundamental matrix, this can be more concretely explained as follows:

  1. Sample at least 8 pairs of the matches out of the raw matches computed based on the SIFT features
  2. Compute the fundamental matrix using the points picked up in the step one
  3. Count how many pairs satisfy x^TFx' < threshold
The above three steps are repeated N times, which is preset based on the number of potential matches and some other parameters. The codes shown below implements the RANSAC algorithm.

    max_iteration = 30000;
    s = 8; % min # needed for model estimation, 8 point estimation
    samp = 8;
    if(samp < s)
        disp('SAMP TOO LOW');
    end
    delta = 0.03;
    n = size(matches_a, 1);
    if(size(matches_b, 1) ~= n)
        disp('matches a and b need to have the same size');
    end
    max_score = -1;

    for i = 1:max_iteration
%       index = randi(n, s, 1); % randi (max_int, row, col)
        index = randperm(n, samp)';%'
        
        x_prime = matches_a(index, :);
        x = matches_b(index, :);

        F = estimate_fundamental_matrix(x_prime, x);

        % converts to the homogeneous coordinate
        x_n = [matches_a, ones(size(matches_a, 1), 1)];
        x_prime_n = [matches_b, ones(size(matches_b, 1), 1)];

        error_fundamental = zeros(size(x_n, 1), 1);
        for j = 1:size(error_fundamental, 1)
            % note the 14.pdf 29, and everywhere on the slide uses x^TFx_prime
            % = 0. This is because their expression uses a column vector, but
            % we store data as a row vector. The transpose locations,
            % therefore, are different

            % this should be zero if a perfect fundamental matrix xFx = 0
            error_fundamental(j) = abs(x_n(j, :) * F * x_prime_n(j, :)');  %'
        end
        ok_index = [];
        score = 0;
        for j = 1:size(error_fundamental, 1)
            if(error_fundamental(j, 1) < delta)
                ok_index(end+1) = j;
                score = score + 1;
            end
        end
        fprintf('[@%d] mean_error = %f\n', i, mean(error_fundamental));

        if(score > max_score)
            max_score = score;
            Best_Fmatrix = F;
            inliers_a = matches_a(ok_index, :);
            inliers_b = matches_b(ok_index, :);
        end
    end
    fprintf('Best score %d out of %d\n', max_score, size(matches_a, 1));
    
The results of Part 3 are summarized in the tables and figures below:
Mount Rushmore
F=1.7323e-08 2.0813e-06 -3.4155e-03
-2.1899e-06 -4.2035e-08 2.1237e-03
3.4402e-03 -1.9433e-03 -5.4517e-02
inliers560/825
threshold 0.03
iterations10000
Raw matches
Ransac matches
Epipolar lines
Notre Dame
F= 4.4423e-08 -9.4723e-06 3.2315e-03
9.6060e-06 5.0777e-07 -3.9273e-03
-3.2518e-03 3.3673e-03 1.1556e-01
inliers585/851
threshold 0.09
iterations12000
Raw matches
Ransac matches
Epipolar lines
Gaudi
F= -3.3673e-07 2.7461e-06 -1.0551e-03
-2.0541e-06 8.9169e-07 -4.0309e-04
1.2551e-03 -1.6282e-03 6.5840e-01
inliers152/1062
threshold 0.01
iterations15000
Raw matches
Ransac matches
Epipolar lines