Project 3 / Camera Calibration and Fundamental Matrix Estimation with RANSAC

Project 3 comes in 3 parts.

Part 1: Camera Projection Matrix

Given 3D coordinates and their corresponding image 2D coordinates. The camera project matrix M can be found. The matrix consists of 12 parameters; with each point giving 2 equations, a total 6 points can find the components of the matrix. The code is shown below:


        A = zeros([12, 11]);
        Ax = zeros([12,1]);
        for i = 1:6
            p2D = Points_2D(i,:);
            p3D = Points_3D(i,:);
            X = p3D(1);
            Y = p3D(2);
            Z = p3D(3);
            u = p2D(1);
            v = p2D(2);
            A(2*i-1, :) = [X Y Z 1 0 0 0 0 -u*X -u*Y -u*Z]; %first equation
            A(2*i, :) =   [0 0 0 0 X Y Z 1 -v*X -v*Y -v*Z]; %second equation
            Ax(2*i-1) = u;
            Ax(2*i) = v;
        end

        [U,S,V] = svd(A);
        S1 = (S ~= 0);
        S = ones(size(S)).*(S == 0) + S;
        S1 = S1./S; %Find 1/sigma for pseudoinverse
        x = V*S1'*U'*Ax;
        x = [x;1];
        x = x./norm(x);
        M = reshape(x, [4,3])'; 
        

After finding the matrix A that represents 11 equations for 11 parametes in matrix M, a singular value decomposition can be used to find the pseudoinverse of the matrix A. With the pseudoinverse, it is thus possible to find the components of M by (A*)x where A* represents the pseudoinverse of A. The resulting entries in (A*)x are linearly dependent on the last parameter. By fixing the last parameter to be 1, the entries of M are then [(A*)x 1]. The entries can then be normalized. The projection matrix for the set of normalized points is:

        0.4610   -0.2945   -0.0030   -0.0014
       -0.0507   -0.0549   -0.5431   -0.0529
        0.1092    0.1796   -0.0311    0.5935
        

The results of the project matrix can be seen below:

This gives a total residual of 0.2703

The camera center can be estimated by taking inverse of the first columns of camera projection matrix M and multiplied by the last column.

         
    Q = M(:,1:3);
    M4 = M(:,4);
    Center = -inv(Q)*M4; 
         
         
The estimated location of camera is: <-1.4948, -2.3472, 0.2793>

Rendering of the scene:


Part 2: Fundamental Matrix Estimation

Given 8 or more points, it is possible to estimate the fundemental matrix that maps a pixel on one image to a line on the second image. Using the 8 or more input points, 8 or more equations can be established. The parameters of the fundemental matrix correspond to the non trivial homogeneous solution to the equations. The code is shown below:


        [n, ~] = size(Points_a);
        A = zeros([n, 9]);
        for i=1:n
            u = Points_a(i, 1);
            v = Points_a(i, 2);
            u_ = Points_b(i, 1);
            v_ = Points_b(i, 2);
            A(i, :) = [u*u_ v*u_ u_ u*v_ v*v_ v_ u v 1];
        end
        [U, S, V] = svd(A);
        f = V(:, end);
        f = f ./norm(f);
        F_matrix = reshape(f, [3 3])';

        [U,S,V] = svd(F_matrix);
        [r,c] = size(S);
        S(r,c) = 0;
        F_matrix = U*S*V';
        

Note: This is the code before the implementation of extra credit. The full implementation is shown later.

To find the homogenous solution to the equations, singular value decomposition is used on matrix A. The last column of V represents the closest solution to 0, thus the entries are the parameters in the fundamental matrix. The fundamental matrix then is decomposed again using SVD to enforce the rank 2 constraint. The resulting fundemental matrix for the base image pair is shown below:

        -0.0000    0.0000   -0.0007
         0.0000   -0.0000    0.0053
        -0.0000   -0.0073    0.1687
        

Epipolar lines in two scenes

It can be seen that all epipolar lines directly cross all points in two images.


Part 3: Fundamental Matrix with RANSAC

SIFT features are examined using RANSAC for their accuracy. Following the steps of RANSAC for each iteration:

  1. Randomly sample 8 or more points. I chose 9 points.
  2. Find the corresponing fundamental matrix using the sampled points.
  3. Find the number of inliers using a preselected threshold(0.01).
The number of iterations required to find all 9 points with 99.9% accuracy is a classic poisson CDF problem. Assuming the probability that each SIFT point is correct is p and events for each point being correct are linearly independent, then the probability for all 9 points being correct is p^9. The average number of events required to have all 9 points being correct is thus 1/(p^9). Poisson inverse can be used to find the number of iterations it takes to find all 9 points being correct with a certain confidence using the average number of events it takes.
p # of iterations
0.9 9
0.8 17
0.7 42
0.6 131
0.5 583
0.4 4007
0.3 51503
0.2 1957445

From testing, the inlier/total_points ratio is around 30~40%, indicating that around 40% of the points from SIFT is indeed true matches. As a result p of 0.4 is chosen, and a total of 4007 iterations are run for each image pair. The code can be seen below:


    [num_matches, ~] = size(matches_a);
    k = 9;
    threshold = 0.01;
    p_correct = 0.4;
    p_all_correct = p_correct^k;
    max_iterations = poissinv(0.999, ceil(1/p_all_correct));
    best_ratio = 0;
    for i = 1:max_iterations
        index = randsample(num_matches, k);
        F = estimate_fundamental_matrix(matches_a(index,:), matches_b(index,:));
        i_a = zeros([num_matches, 2]);
        i_b = zeros([num_matches, 2]);
        counter = 0;
        for p = 1:num_matches
            p1 = matches_a(p, :);
            p2 = matches_b(p, :);
            residual = [p2 1]*F*[p1 1]';
            if( abs(residual(1)) < threshold )
                counter = counter + 1;
                i_a(counter, :) = p1;
                i_b(counter, :) = p2;
            end
        end
        if((counter/num_matches) > best_ratio)
            best_ratio = counter/num_matches;
            Best_Fmatrix = F;
            best_i_a = i_a(1:counter, :);
            best_i_b = i_b(1:counter, :);
        end
    end
        

The following are test image pairs.

Mount Rushmore

Notre Dame

Woodruff


Extra credit: Coordinate normalization

The previous code does not work well with Episcopal Gaudi image pair.

Episcopal Gaudi

It is obvious that the fundemental matrix is way off as indicated by the epipolar lines. To mitigate that, the input points can be normalized so that all points are no more than 2 distance away from the origin. Each image is normalized individually so that each image has a transformation matrix T . The final fundemental matrix is T_b' * F_matrix * T_a where T_a is the transformation matrix of the first image, the T_b is the that of the second image and the F_matrix is the fundemental matrix estimated using the normalized points.


    [n, ~] = size(Points_a);
    o = ones([n, 1]);
    cu_a = mean(Points_a(:,1));
    cv_a = mean(Points_a(:,2));
    s_a = 2/mean(((Points_a(:, 1) - o * cu_a).^2 + (Points_a(:,2) - o*cv_a).^2).^0.5); 
    T_a = [s_a 0 0; 0 s_a 0; 0 0 1]*[1 0 -cu_a; 0 1 -cv_a; 0 0 1]; 
    Points_a = T_a*[Points_a, o]';
    Points_a = Points_a';
    Points_a = Points_a(:, 1:end-1);

    cu_b = mean(Points_b(:,1));
    cv_b = mean(Points_b(:,2));
    s_b = 2/mean(((Points_b(:, 1) - o * cu_b).^2 + (Points_b(:,2) - o*cv_b).^2).^0.5); 
    T_b = [s_b 0 0; 0 s_b 0; 0 0 1]*[1 0 -cu_b; 0 1 -cv_b; 0 0 1]; 
    Points_b = T_b*[Points_b, o]';
    Points_b = Points_b';
    Points_b = Points_b(:, 1:end-1);

    A = zeros([n, 9]);
    for i=1:n
        u = Points_a(i, 1);
        v = Points_a(i, 2);
        u_ = Points_b(i, 1);
        v_ = Points_b(i, 2);
        A(i, :) = [u*u_ v*u_ u_ u*v_ v*v_ v_ u v 1];
    end
    [U, S, V] = svd(A);
    f = V(:, end);
    f = f ./norm(f);
    F_matrix = reshape(f, [3 3])';
    [U,S,V] = svd(F_matrix);
    [r,c] = size(S);
    S(r,c) = 0;
    F_matrix = U*S*V';

    F_matrix = T_b' * F_matrix * T_a;
        

The result of the normalization is obvious.

Episcopal Gaudi with normalized points