Project 3 / Camera Calibration and Fundamental Matrix Estimation with RANSAC


Table of Contents:

Motivation:

In project 2, we used the harris corner detector to detect interest points, a sift-like approach to compute features, and the nearest neighbors distance ratio to math features. This approach worked very well for some images such as Notre Dame and Mount Rushmore where we achieved up to 97% accuracy for feature matching. However, upon matching features with Episcopal Gaudi, we were left with a poor 5% accuracy for feature matching. How can we achieve a near perfect score for Episcopal Gaudi? We can follow a new approach. We can use another method called RANSAC to find the fundamental matrix with the most inliers, then filter away the spurious matches. This will give us near perfect point to point matching.

Procedure:

To complete our goal we will take the following steps:

Part 1: Camera Projection Matrx

Goal: The goal here is to compute the projection matrix that goes from world three dimensional coordinates to two dimensional image coordinates. We write an expression for this using homogenous coordinates. Recall, that homogenous coordinates are a way of representing N-dimensional coordiantes with N+1 numbers. These are how we represent three dimensional coordinates in two dimensions. Say we have a cartesian coordinate $(x_1,y_1)$. We would then denote this point as $(x_1,y_1,1)$ as a homogenous coordinate. To convert from a homogenous coordinate $(x,y,w)$ to cartesian coordinates, we simply divide $x$ and $y$ by $w$. Thus, our cartesian coordinate would be $\left(\frac{x}{w},\frac{y}{w}\right)$. Let us look at an example. Suppose we have the cartesian coordiante coordinate $(1,2,3)$ then the homogenous coordinate would be $\left(\frac{1}{3},\frac{2}{3}\right)$. If we had the cartesian coordinate $(2,4,6)$ then we would have the homogenous coordinate $\left(\frac{2}{6},\frac{4}{6}\right) = \left(\frac{1}{3},\frac{2}{3}\right)$. We see that homogenous coordinates represent the same point in the Cartesian space. The homogenous coordinates are scale invariant. Using homogenous coordinates the equation for moving from three dimensional coordinates to two dimensional coordinates is:

$\begin{bmatrix}u \\v \\1\end{bmatrix} \cong \begin{bmatrix}u*s \\v*s \\s\end{bmatrix} = \begin{bmatrix} m_{11} & m_{12} & m_{13} & m_{14} \\m_{21} & m_{22} & m_{23} & m_{24} \\m_{31} & m_{32} & m_{33} & m_{34}\end{bmatrix} \begin{bmatrix}X \\Y \\Z \\ 1\end{bmatrix} $

We can rewrite the above after performing matrix multiplication on the right side to get:

$u = \frac{m_{11}X+m_{12}Y+m_{13}Z+m_{14}}{m_{31}X+m_{32}Y+m_{33}Z+m_{34}}$

$\Rightarrow$

$0 = m_{11}X+m_{12}Y+m_{13}Z+m_{14}-m_{31}uX-m_{32}uY-m_{33}uZ-m_{34}u$

and

$v = \frac{m_{21}X+m_{22}Y+m_{23}Z+m_{24}}{m_{31}X+m_{32}Y+m_{33}Z+m_{34}}$

$\Rightarrow$

$ 0 = m_{21}X+m_{22}Y+m_{23}Z+m_{24}-m_{31}vX-m_{32}vY-m_{33}vZ-m_{34}v$

Once we set this up in our matlab code we can fix our last element $m_{34}$ to $1$ and find the remaining coefficients.

Non-homogenous Implementation: Let us now implement the matlab function calculate_projection_matrix():

The code is as follows:


function M = calculate_projection_matrix( Points_2D, Points_3D )
M = [0;0;0;0;0;0;0;0;0;0;0]; 
A = []; 
B = [];
for i = 1:size(Points_2D,1) 
    u_i = Points_2D(i,1); 
    v_i = Points_2D(i,2); 
    x_i = Points_3D(i,1); 
    y_i = Points_3D(i,2); 
    z_i = Points_3D(i,3); 
    A_vec_1 = [x_i y_i z_i 1 0 0 0 0 -u_i*x_i -u_i*y_i -u_i*z_i];
    A_vec_2 = [ 0 0 0 0 x_i y_i z_i 1 -v_i*x_i -v_i*y_i -v_i*z_i];
    A(end+1,:) = A_vec_1;
    A(end+1,:) = A_vec_2;
    B(end+1,:) = u_i; 
    B(end+1,:) = v_i;
end 
M = transpose(reshape([A\B;1],[],3));
end

Non-homogenous Results: Let us run proj3_part1.m with some easy normalized points. The results are as follows:

The projection matrix is:

$\begin{bmatrix} 0.7679 & -0.4938 & -0.0234 & 0.0067 \\ -0.0852 & -0.0915 & -0.9065 & -0.0878 \\ 0.1827 & 0.2988 & -0.0742 & 1.0000 \end{bmatrix}$

The total residual is: 0.0445.

A plot of the visualization of the points can be found below (hover to zoom):


Visualize Points

In the graph above, we see that the actual points and the projected points align very well. There is a small total residual of 0.0445, which is less than 1 and hence a good total residual.

Let us check our matrix. When we solved for M using the normalized points we should get a matrix that is a scaled equivalent of the following:

$\begin{bmatrix} -0.4583 & 0.2947 & 0.0139 & -0.0040 \\ 0.0509 & 0.0546 & 0.5410 & 0.0524\\ -0.1090 & -0.1784 & 0.0443 & -0.5968 \end{bmatrix}$

If we look at our matrix we see that:

$-0.5968*\begin{bmatrix} 0.7679 & -0.4938 & -0.0234 & 0.0067 \\ -0.0852 & -0.0915 & -0.9065 & -0.0878 \\ 0.1827 & 0.2988 & -0.0742 & 1.0000 \end{bmatrix} = \begin{bmatrix} -0.4583 & 0.2947 & 0.0139 & -0.0040 \\ 0.0509 & 0.0546 & 0.5410 & 0.0524\\ -0.1090 & -0.1784 & 0.0443 & -0.5968 \end{bmatrix}$

Thus, we see that we do infact have a scaled version of the above.

We also note that we could sove the system as a homogenous linear system. Let us take a look at the code for this method:

Homogenous Implementation: The steps are very similar to those above, howerver we will add $-u_i$ in the last column of the odd rows and $-v_i$ in the last column of the even rows. We will then be able to use singular value decomposition (SVD) on matrix A, then take V(:,end) and then reshape this into a 3 x 3 matrix. The code is as follows:


function M = calculate_projection_matrix( Points_2D, Points_3D )
M = [0;0;0;0;0;0;0;0;0;0;0]; 
A = []; 
for i = 1:size(Points_2D,1) 
    u_i = Points_2D(i,1); 
    v_i = Points_2D(i,2); 
    x_i = Points_3D(i,1); 
    y_i = Points_3D(i,2); 
    z_i = Points_3D(i,3); 
    A_vec_1 = [x_i y_i z_i 1 0 0 0 0 -u_i*x_i -u_i*y_i -u_i*z_i -u_i]; %
    A_vec_2 = [ 0 0 0 0 x_i y_i z_i 1 -v_i*x_i -v_i*y_i -v_i*z_i -v_i]; %
    A(end+1,:) = A_vec_1;
    A(end+1,:) = A_vec_2;
end 
[U,S,V] = svd(A);
M = V(:,end)
M = transpose(reshape(M,[],3)); 
end

Let us look at the results of this method:

Homogenous Results:

The projection matrix is:

$\begin{bmatrix} 0.4583 & -0.2947 & -0.0140 & 0.0040 \\ -0.0509 & -0.0546 & -0.5410 & -0.0524\\ 0.1090 & 0.1784 & -0.0443 & 0.5968 \end{bmatrix}$

The total residual is: 0.0445.

We see that this is scaled by -1 from what we are supposed to get for the normalized points, which is good.

A plot of the visualization of the points can be found below (hover to zoom):


Visualize Points

We see that we get exactly the same output as using the non-homogenous method.

Now, that we have a working version of calculate_projection_matrix(), let us implement compute_camera_center().

Part 1.1: Camera Center

Goal: What we now want to do is to compute the camera center in the matlab function compute_camera_center(). We will be estimating one particular extrinsic parameter: the camera center in world coordinates. We will take our projection matrix M and let Q represent the 3x3 matrix inside the 3x4 and let $m_4$ represent the fourth column. We can write

$M = (Q|m_4)$

Now, we can compute the center of the camera, denoted $C$ as:

$C=-Q^{-1}m_4$

ImplementationLet us now implement the matlab function compute_camer_center():

The code is as follows:


function [ Center ] = compute_camera_center( M )
Q = M(:,1:3); 
m_4 = M(:,4); 
Center = (-Q^-1)*m_4;
end

Let us run proj3_part1.m now and examine the output:

Results:

The estimated location of camera is: <-1.5126, -2.3517, 0.2827>

The graph below visualizes the actual 3D points and the estimated 3D camera center:

Let us check our results. We used the normalized 3D points to get M, so we should arrive at a camera center near <-1.5125,-2.3515,0.2826>. We see that we are very close to this point.

Let us now move on to part 2, the fundamental matrix estimation.

Part 2: Fundamental Matrix Estimation

Goal: We now want to estimate the mapping of points in one image to lines in another by means of the fundamental matrix. The definition of the fundamental matrix is as follows:

$\begin{bmatrix} u' & v' & 1 \end{bmatrix} \begin{bmatrix} f_{11} & f_{12} & f_{13} \\ f_{21} & f_{22} & f_{23} \\ f_{31} & f_{32} & f_{33} \end{bmatrix} \begin{bmatrix} u \\ v\\ 1 \end{bmatrix} = 0$

We can expand this out as:

$(f_{11}uu'+f_{12}vu'+f_{13}uv'+f_{21}uv'+f_{22}vv'+f_{23}v'+f_{31}u+f_{32}v+f_{33})=0$

We need to follow a similar approach as we did in part 1 by first fixing the scale and solving the regression. The least squares estimate of F is full rank, and the fundamental matrix has rank 2. So, we will need to reduce the rank. To do this, we can decompose F using singular value decomposition (SVD) into the matrices $U\sum V'=F$. We can then estimate a rank 2 matrix by setting the smallest singular value in $\sum$ equal to 0 and thus generating $\sum_2$. We can then calculate the fundamental matrix as $F=U\sum_2V'$.

Implementation: We will now implement estimate_fundamental_matrix() as follows:

The code is as follows:


function [ F_matrix ] = estimate_fundamental_matrix(Points_a,Points_b)
a_rows = size(Points_a,1); 
b_rows = size(Points_b,1);
vec = [1 1 1]; 
new_a = horzcat(Points_a, ones(a_rows,1)); %Add a column of ones 
new_b = horzcat(Points_b, ones(b_rows,1)); %Add a column of ones
hc = horzcat(new_a,new_a,new_a); 
M = kron(new_b,vec).*hc              
[U,S,V]=svd(M);
last_col = size(V,2); 
F=transpose(reshape(V(:,last_col),3,3));
[F_U,F_D,F_V]=svd(F);
if F_D(3,3) ~= 0
    F_D(3,3) = 0;
end
F_matrix = F_U*F_D*transpose(F_V);

Let us now examine the results:

Results:

After running proj3_part2.m with our estimate_fundamental_matrix() function we arrive at an F_matrix of:

$$\begin{bmatrix} -0.0000 & 0.0000 & -0.0019 \\ 0.0000 & 0.0000 & 0.0172 \\ -0.0009 & -0.264 & 0.9995 \end{bmatrix}$$

The epipolar lines for the corresponding two images are as follows:


Left Image


Right Image

We see that we have very straight epipolar lines and thus have a good fundamental matrix estimation. Let us now move onto the final part of our project.

Part 3: Fundamental Matrix with RANSAC

Goal: For two photographs of a scene, it is unlikely that we would have perfect point correspondence with which to do the regression for the fundamental matrix. So, we will compute the fundamental matrix with unreliable point correspondeces computed with SIFT. Least squares regression is not appropriate here due to the presence of multiple outliers. We can use RANSAC in conjunction with our fundamental matrix estimation in order to estimate the fundamental matrix from this noisy data.

RANSAC: Random Sample Consensus (RANSAC) is an interative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. We can think of ransac as an outlier detection method. Let us take a look at a simple example: fitting a line.

Let us look at the fitting of a line in two dimensions from a set of observation points. This set contains inliers and outliers. If we use a simple least squares method for line fitting, we will get a line that has a bad fit to the inliers. This line is poor because it is optimally fitted to all points, including the outliers. If we fit the data points to RANSAC we will get the following:

The RANSAC randomly samples the observed data. A voting scheme is used to find the optimal result. Elements in the data set vote for one or multiple models. The two steps of the algorithm are as follows:

An animation of the process can be found below:

Now that we understand the algorithm, let us proceed to our implementation of it in our matlab function ransac_fundamental_matrix():

Implementation:

The code for the above is as follows:


function [ Best_Fmatrix, inliers_a, inliers_b] = ransac_fundamental_matrix(matches_a, matches_b)
iterations = 3500;
rows = size(matches_a,1);
err_threshold = 0.005
best = 0;
for i=1:iterations
    update = 0;
    rand_index = randsample(rows,8);
    rand_a = matches_a(rand_index,:); 
    rand_b = matches_b(rand_index,:); 
    F_matrix = estimate_fundamental_matrix(rand_a,rand_b); 
    a_inliers = []; 
    b_inliers = [];
    for j=1:rows 
        err = [matches_a(j,:) 1]*transpose(F_matrix)*transpose([matches_b(j,:),1]);
        abs_error = abs(err); 
        if abs_error < err_threshold 
            a_inliers(end+1,:) = matches_a(j,:); 
            b_inliers(end+1,:) = matches_b(j,:); 
            update = update+1; 
        end
    end 
    if update > best
        best = update; 
        Best_Fmatrix = F_matrix;
        inliers_a = a_inliers; 
        inliers_b = b_inliers; 
    end 
end 
inlier_a_rows = size(inliers_a,1); 
if inlier_a_rows > 30 
    rand_index = randsample(inlier_a_rows,30); 
    inliers_a = inliers_a(rand_index,:); 
    inliers_b = inliers_b(rand_index,:); 
end

Let us now take a look at our results.

Results: The results are as follows:

Let us take a look at the Mount Rushmore image with 5000 iterations and an inlier threshold of 0.005:

Mount Rushmore


Left Image


Right Image


Matches

Let us now take a look at the same image with 5000 iterations and an inlier threshold of 0.0055:

Mount Rushmore


Left Image


Right Image


Matches

Let us now take a look at the same image with 5000 iterations and an inlier threshold of 0.01:

Mount Rushmore


Left Image


Right Image


Matches

Let us now take a look at the same image with 500 iterations and an inlier threshold of 0.01:

Mount Rushmore


Left Image


Right Image


Matches

Let us now take a look at the same image with 50 iterations and an inlier threshold of 0.01:

Mount Rushmore


Left Image


Right Image


Matches

Let us now take a look at the same image with 50 iterations and an inlier threshold of 0.0055:

Mount Rushmore


Left Image


Right Image


Matches

Let us now draw some conclusions about the above results:

We see that our results are better when we use more iterations and a lower threshold for our inliers. This is to be expected. However, we do get less matches and less inliers the stricter we are with our threshold. I found that using a threshold of 0.005 and 5000 iterations was great because we got very straight lines through the center of our dots and a nice amount of accurate matches. The accuracy was very good as seen in the images with 30 randomly sampled pairs. The number of iterations is important but I would rather have less iterations and a higher threshold for the inliers if I had to choose between the two. Running many trials with different combinations of iterations and inlier thresholds, the inlier threshold seemed to make the most difference in the accuracy.

Let us take a look at some of the other images:

Let us now take a look at the Notre Dame image with 5000 iterations and an inlier threshold of 0.0055:

Notre Dame


Left Image


Right Image


Matches

Let us now take a look at the Notre Dame image with 5000 iterations and an inlier threshold of 0.01:

Notre Dame


Left Image


Right Image


Matches

We see that for the Notre Dame, we get nearly perfect accuracy. Our lines are very straight and from our sample of matches, we see that a majority of them are correct.

Let us now take a look at the Episcopal Gaudi image with 5000 iterations and an inlier threshold of 0.01:

Episcopal Gaudi


Left Image


Right Image


Matches

Let us now take a look at the Episcopal Gaudi image with 5000 iterations and an inlier threshold of 0.005:

Episcopal Gaudi


Left Image


Right Image


Matches

We see that the Episcopal Gaudi image does not perform as well. There are a few blue lines that do not pass through the centers of the circles and a few of the matching lines look more crooked. However, the performance is not bad and is definitely better than the performance in project 2 for the Episcopal Gaudi image.

Let us now take a look at our own image, St Basil's Cathedral:

We will use 5000 iterations and an inlier threshold of 0.005.

We selected 20 inlier points and the result is the following:

St Basil's Cathedral


Left Image


Right Image


Matches

We see that there are a few points in the left and right images that do not exactly fall on the line. We can look at the matches and verify this. There are a few crooked lines. Here we do not have amazing matching.

Let us take a look at another image of a basketball sneaker:

LeBron Shoe


Left Image


Right Image


Matches

Here we see that the features are matched much better and that the inlier points are directly through the center of the blue lines. Our matching accuracy was very good for this image.

We note that the matching was great for Notre Dame, Mount Rushmore, and the LeBron sneaker. Let us implement some extra credit to see if we can perform better with the Episcopal Gaudi image.

Extra Credit:

Goal: Our estimate of the fundamental matrix can be improved by normalizing the coordinates before computing the fundamental matrix. We can perform normalization through linear transformations, discussed below, to make the mean of the points zero and the average magnitude 1.0 or some other smaller number such as the square root of two. The nromalization through linear transformations is described as follows:

$$\begin{bmatrix} u' & v' & 1 \end{bmatrix} = \begin{bmatrix} s & 0 & 0 \\ 0 & s & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & -c_u \\ 0 & 1 & -c_v \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}$$

The transform matrix T is the produce of the scale and ofset matrices. $c_u$ and $c_v$ are the mean coordinated. To compute a scale $s$ we can estimate the standard deviation after subtracting the means. Then the scale factor $s$ would be the reciprocal of whatever the estimate of the scale we are using. Or, we could find the maximum absolute value. Or, we could scale the coordinates such that the average suqared distance from the origin, after centering, is two. We will need to use these scaling matrices, however we chose, to adjust the fundamental matrix so that it can operate on the original pixel coordinates as follows:

$F_{\text{orig}} = T_b^T*F_\text{norm} *T_a$

Let us proceed to implement this in the function estimate_fundamental_matrix():

Implementation: Let us now implement the extra credit portion of the project:

The code is as follows:


function [ F_matrix ] = estimate_fundamental_matrix(Points_a,Points_b)
a_rows = size(Points_a,1); 
b_rows = size(Points_b,1);

a_cu = mean(Points_a(:,1)); 
a_cv = mean(Points_a(:,2));
b_cu = mean(Points_b(:,1)); 
b_cv = mean(Points_b(:,2));

s = a_rows/sum(((Points_a(:,1)-a_cu).^2 + (Points_a(:,2)-a_cv).^2).^(1/2));
Ta_pre = [s 0 0; 0 s 0; 0 0 1];
Ta_o = [1 0 -a_cu; 0 1 -a_cv; 0 0 1];
Ta = Ta_pre*Ta_o; 
s = a_rows/sum(((Points_b(:,1)-b_cu).^2 + (Points_b(:,2)-b_cv).^2).^(1/2));
Tb_pre = [s 0 0; 0 s 0; 0 0 1];
Tb_o = [1 0 -b_cu; 0 1 -b_cv; 0 0 1];
Tb = Tb_pre*Tb_o;
Points_a = transpose(Ta*transpose([Points_a ones(size(Points_a,1),1)])); 
Points_b = transpose(Tb*transpose([Points_b ones(size(Points_b,1),1)])); 

mtx = [];
for i=1:a_rows 
    u = Points_a(i,1);
    u_prime = Points_b(i,1); 
    v = Points_a(i,2); 
    v_prime = Points_b(i,2); 
    mtx(end+1,:) = [u*u_prime v*u_prime u_prime u*v_prime v*v_prime v_prime u v]; 
end

[U,S,V] = svd(transpose(transpose(Ta)*(reshape([-1*mtx\(ones(a_rows,1));1],[],3))*Tb)); 
if S(3,3) ~= 0
    S(3,3) = 0; 
end 
F_matrix = U*S*transpose(V); 
end

Now that we have implemented our new estimate_fundamental_matrix() function, let us run proj3_part3.m with the Episcopal Gaudi image with 5000 iterations and an inlier threshold of 0.005. The results are as follows:

Results:

Episcopal Gaudi


Left Image


Right Image


Matches

The Episcopal Gaudi images from before the extra credit normalization, with the same number of iterations and inlier threshold can be found below with 30 inliers shown:

Episcopal Gaudi


Left Image


Right Image


Matches

We see that our new estimate_fundamental_matrix() function with the extra credit normalization is much better. The matching lines are much straighter and the dots are now centered on the blue lines in the left and right images. Through this normalization, we have achieved much higher accuracy. We note that the Episcopal Gaudi image is now almost perfect.

Let us now look at the Notre Dame Image with 5000 iterations and an inlier threshold of 0.005:

Notre Dame


Left Image


Right Image


Matches

Here the results are stellar. The matches are very straight and map to the correct places. Also in our left and right images, the dots are perfectly centerd on the blue lines. We see that we have done a very good job with our Notre Dame image using the normalization.

Using the normalization in the extra credit has helped us greatly. We have seen improvement on the Notre Dame image and have seen a vast improvement in the Episcopal Gaudi image.

Conclusion:

Throughout the project we calculated the camera projection matrix using both a homogenous method and a non-homogenous method. Here we only had to choose one method, but I thought it would be nice to display both. We showed the results for both methods and how they are very similar. We then computed the camera center and displayed its results. Next we estimated the fundamental matrix without normalization and showed some results for a few images. After, we estimated the fundamental matrix with RANSAC for a variety of images, some from the original dataset and some of my own images. Finally, we completed the extra credit and normalized the coordinates before computing the fundamental matrix. In conclusion, we ended up getting some great matches and nice epipolar lines. In addition, I added a nice javascript function so that you can zoom in through hover and zoom out when the mouse is moved away from the image. Also, I added lengthy mathematical explanations for each section so that one who reads this webpage has full information to come up with a solution of their own. Overall, this was a great project and I enjoyed seeing the improvements from project 2 to project 3.

References

  • Images
  • Content