Computer Vision Project

Project 3 / Camera Calibration and Fundamental Matrix Estimation with RANSAC

In project 2, we used the harris corner detector to detect interest points, a sift-like approach to compute features, and the nearest neighbors distance ratio to math features. This approach worked very well for some images such as Notre Dame and Mount Rushmore where we achieved up to 97% accuracy for feature matching. However, upon matching features with Episcopal Gaudi, we were left with a poor 5% accuracy for feature matching. How can we achieve a near perfect score for Episcopal Gaudi? We can follow a new approach. We can use another method called RANSAC to find the fundamental matrix with the most inliers, then filter away the spurious matches. This will give us near perfect point to point matching.

Procedure:

To complete our goal we will take the following steps:

Camera Projection Matrix: We will compute the projection matrix that starts from three dimensional coordinates and ends in two dimensional coordinates. Our code for this will be in the matlab function proj3_part1.m
Fundamental Matrix Estimation: We will estimate the mapping of points in one image to lines in another image by means of the fundamental matrix. Our code for this will be in the matlab function proj3_part2.m
Fundamental Matrix with Ransac :We will compute the fundamental matrix with unreliable point correspondences computed with SIFT. We will then use RANSAC in conjunction with out fundamental matrix estimation. Our code for this will be in the matlab function proj3_part3.m
Results: We will take a look at the results of the above.
Graduate Credit: Coordinate Normalization We will improve our estimate of the fundamental matrix by normalizing the coordinates before computing the fundamental matrix.

Part 1: Camera Projection Matrx

Goal: The goal here is to compute the projection matrix that goes from world three dimensional coordinates to two dimensional image coordinates. We write an expression for this using homogenous coordinates. Recall, that homogenous coordinates are a way of representing N-dimensional coordiantes with N+1 numbers. These are how we represent three dimensional coordinates in two dimensions. Say we have a cartesian coordinate $(x_1,y_1)$. We would then denote this point as $(x_1,y_1,1)$ as a homogenous coordinate. To convert from a homogenous coordinate $(x,y,w)$ to cartesian coordinates, we simply divide $x$ and $y$ by $w$. Thus, our cartesian coordinate would be $\left(\frac{x}{w},\frac{y}{w}\right)$. Let us look at an example. Suppose we have the cartesian coordiante coordinate $(1,2,3)$ then the homogenous coordinate would be $\left(\frac{1}{3},\frac{2}{3}\right)$. If we had the cartesian coordinate $(2,4,6)$ then we would have the homogenous coordinate $\left(\frac{2}{6},\frac{4}{6}\right) = \left(\frac{1}{3},\frac{2}{3}\right)$. We see that homogenous coordinates represent the same point in the Cartesian space. The homogenous coordinates are scale invariant. Using homogenous coordinates the equation for moving from three dimensional coordinates to two dimensional coordinates is:

$\begin{bmatrix}u \\v \\1\end{bmatrix} \cong \begin{bmatrix}u*s \\v*s \\s\end{bmatrix} = \begin{bmatrix} m_{11} & m_{12} & m_{13} & m_{14} \\m_{21} & m_{22} & m_{23} & m_{24} \\m_{31} & m_{32} & m_{33} & m_{34}\end{bmatrix} \begin{bmatrix}X \\Y \\Z \\ 1\end{bmatrix} $

We can rewrite the above after performing matrix multiplication on the right side to get:

$u = \frac{m_{11}X+m_{12}Y+m_{13}Z+m_{14}}{m_{31}X+m_{32}Y+m_{33}Z+m_{34}}$

$\Rightarrow$

$0 = m_{11}X+m_{12}Y+m_{13}Z+m_{14}-m_{31}uX-m_{32}uY-m_{33}uZ-m_{34}u$

and

$v = \frac{m_{21}X+m_{22}Y+m_{23}Z+m_{24}}{m_{31}X+m_{32}Y+m_{33}Z+m_{34}}$

$\Rightarrow$

$ 0 = m_{21}X+m_{22}Y+m_{23}Z+m_{24}-m_{31}vX-m_{32}vY-m_{33}vZ-m_{34}v$

Once we set this up in our matlab code we can fix our last element $m_{34}$ to $1$ and find the remaining coefficients.

Non-homogenous Implementation: Let us now implement the matlab function calculate_projection_matrix():

Parameters: Points_2D is an nx2 matrix of 2D coordinate points on the image and Points_3D is an nx3 matrix of 3D coordinate points in the world
Setup Variables: We create a matrix M that has one row and 11 columns fill it with zeros. We create an empty vector A and an empty vector B, which will be the setup of our non-homogenous set of equations using the corresponding two dimensional and three dimensional points. Matrix A will have the following structure: $\begin{bmatrix} X_1 & Y_1 & Z_1 & 1 & 0 & 0 & 0 & 0 & -u_1*X_1 & -u_1*Y_1 & -u_1*Z_1 \\0 & 0 & 0 & 0 & X_1 & Y_1 & Z_1 & 1 & -v_1*X_1 & -v_1*Y_1 & -v_1*Z_1 \\.&.&.&.&.&.&.&.&.&.&. \\ X_n & Y_n & Z_n & 1 & 0 & 0 & 0 & 0 & -u_n*X_n & -u_n*Y_n & -u_n*Z_n \\0 & 0 & 0 & 0 & X_n & Y_n & Z_n & 1 & -v_n*X_n & -v_n*Y_n & -v_n*Z_n \end{bmatrix}$ Matrix B will have the following structure: $\begin{bmatrix}u_1 \\ v_1 \\ . \\ . \\ . \\ . \\ . \\ . \\ . \\ u_n \\ v_n \end{bmatrix}$ Our Matrix M will be the projection matrix, which will be a 3x4 matrix.
For loop: Next we want to iterate from i=1:rows_in_2D_image. We will compute u_i which will be the in the first column of Points_2D, v_i which will be in the second column of Points_2D, x_i which will be in the first column of Points_3D, y_i which will be in the second column of Points_3D, z_i, which will be in the third column of Points_3D, A_vec_1 which will contain the odd rows in the matrix A in setup above, and A_vec_2 which will contain the even rows in the matrix A in the steup above. We will then add A_vec_1 and A_vec_2 into matrix A and will add u_i and v_i into matrix B.
Solve: We will divide A\B so that we can solve for $\begin{bmatrix}M_{11} \\ M_{12} \\ M_{13} \\ M_{14}\\ M_{21} \\ M_{22}\\M_{23}\\ M_{24}\\M_{31}\\ M_{32} \\ M_{33} \end{bmatrix}$ However, we need to add a value of 1.000 at then end of matrix A\B. Then, we reshape M from a 12x1 matrix to a 4x3 matrix and take the transpose so we get a 3x4 matrix.

The code is as follows:


function M = calculate_projection_matrix( Points_2D, Points_3D )
M = [0;0;0;0;0;0;0;0;0;0;0]; 
A = []; 
B = [];
for i = 1:size(Points_2D,1) 
    u_i = Points_2D(i,1); 
    v_i = Points_2D(i,2); 
    x_i = Points_3D(i,1); 
    y_i = Points_3D(i,2); 
    z_i = Points_3D(i,3); 
    A_vec_1 = [x_i y_i z_i 1 0 0 0 0 -u_i*x_i -u_i*y_i -u_i*z_i];
    A_vec_2 = [ 0 0 0 0 x_i y_i z_i 1 -v_i*x_i -v_i*y_i -v_i*z_i];
    A(end+1,:) = A_vec_1;
    A(end+1,:) = A_vec_2;
    B(end+1,:) = u_i; 
    B(end+1,:) = v_i;
end 
M = transpose(reshape([A\B;1],[],3));
end

Non-homogenous Results: Let us run proj3_part1.m with some easy normalized points. The results are as follows:

The projection matrix is:

$\begin{bmatrix} 0.7679 & -0.4938 & -0.0234 & 0.0067 \\ -0.0852 & -0.0915 & -0.9065 & -0.0878 \\ 0.1827 & 0.2988 & -0.0742 & 1.0000 \end{bmatrix}$

The total residual is: 0.0445.

A plot of the visualization of the points can be found below (hover to zoom):

Visualize Points

In the graph above, we see that the actual points and the projected points align very well. There is a small total residual of 0.0445, which is less than 1 and hence a good total residual.

Let us check our matrix. When we solved for M using the normalized points we should get a matrix that is a scaled equivalent of the following:

$\begin{bmatrix} -0.4583 & 0.2947 & 0.0139 & -0.0040 \\ 0.0509 & 0.0546 & 0.5410 & 0.0524\\ -0.1090 & -0.1784 & 0.0443 & -0.5968 \end{bmatrix}$

If we look at our matrix we see that:

$-0.5968*\begin{bmatrix} 0.7679 & -0.4938 & -0.0234 & 0.0067 \\ -0.0852 & -0.0915 & -0.9065 & -0.0878 \\ 0.1827 & 0.2988 & -0.0742 & 1.0000 \end{bmatrix} = \begin{bmatrix} -0.4583 & 0.2947 & 0.0139 & -0.0040 \\ 0.0509 & 0.0546 & 0.5410 & 0.0524\\ -0.1090 & -0.1784 & 0.0443 & -0.5968 \end{bmatrix}$

Thus, we see that we do infact have a scaled version of the above.

We also note that we could sove the system as a homogenous linear system. Let us take a look at the code for this method:

Homogenous Implementation: The steps are very similar to those above, howerver we will add $-u_i$ in the last column of the odd rows and $-v_i$ in the last column of the even rows. We will then be able to use singular value decomposition (SVD) on matrix A, then take V(:,end) and then reshape this into a 3 x 3 matrix. The code is as follows:


function M = calculate_projection_matrix( Points_2D, Points_3D )
M = [0;0;0;0;0;0;0;0;0;0;0]; 
A = []; 
for i = 1:size(Points_2D,1) 
    u_i = Points_2D(i,1); 
    v_i = Points_2D(i,2); 
    x_i = Points_3D(i,1); 
    y_i = Points_3D(i,2); 
    z_i = Points_3D(i,3); 
    A_vec_1 = [x_i y_i z_i 1 0 0 0 0 -u_i*x_i -u_i*y_i -u_i*z_i -u_i]; %
    A_vec_2 = [ 0 0 0 0 x_i y_i z_i 1 -v_i*x_i -v_i*y_i -v_i*z_i -v_i]; %
    A(end+1,:) = A_vec_1;
    A(end+1,:) = A_vec_2;
end 
[U,S,V] = svd(A);
M = V(:,end)
M = transpose(reshape(M,[],3)); 
end

Let us look at the results of this method:

Homogenous Results:

The projection matrix is:

$\begin{bmatrix} 0.4583 & -0.2947 & -0.0140 & 0.0040 \\ -0.0509 & -0.0546 & -0.5410 & -0.0524\\ 0.1090 & 0.1784 & -0.0443 & 0.5968 \end{bmatrix}$

The total residual is: 0.0445.

We see that this is scaled by -1 from what we are supposed to get for the normalized points, which is good.

A plot of the visualization of the points can be found below (hover to zoom):

Visualize Points

We see that we get exactly the same output as using the non-homogenous method.

Now, that we have a working version of calculate_projection_matrix(), let us implement compute_camera_center().

Part 1.1: Camera Center

Goal: What we now want to do is to compute the camera center in the matlab function compute_camera_center(). We will be estimating one particular extrinsic parameter: the camera center in world coordinates. We will take our projection matrix M and let Q represent the 3x3 matrix inside the 3x4 and let $m_4$ represent the fourth column. We can write

$M = (Q|m_4)$

Now, we can compute the center of the camera, denoted $C$ as:

$C=-Q^{-1}m_4$

ImplementationLet us now implement the matlab function compute_camer_center():

Parameters: M, the projection matrix.
Q: We let Q be the first 3 columns of M, so that it is a 3x3 matrix.
$m_4$: We let $m_4$ be the fourth column of M.
Center:We compute the center as $(-Q^{-1})*m_4$

The code is as follows:


function [ Center ] = compute_camera_center( M )
Q = M(:,1:3); 
m_4 = M(:,4); 
Center = (-Q^-1)*m_4;
end

Let us run proj3_part1.m now and examine the output:

Results:

The estimated location of camera is: <-1.5126, -2.3517, 0.2827>

The graph below visualizes the actual 3D points and the estimated 3D camera center:

Let us check our results. We used the normalized 3D points to get M, so we should arrive at a camera center near <-1.5125,-2.3515,0.2826>. We see that we are very close to this point.

Let us now move on to part 2, the fundamental matrix estimation.

Part 2: Fundamental Matrix Estimation

Goal: We now want to estimate the mapping of points in one image to lines in another by means of the fundamental matrix. The definition of the fundamental matrix is as follows:

$\begin{bmatrix} u' & v' & 1 \end{bmatrix} \begin{bmatrix} f_{11} & f_{12} & f_{13} \\ f_{21} & f_{22} & f_{23} \\ f_{31} & f_{32} & f_{33} \end{bmatrix} \begin{bmatrix} u \\ v\\ 1 \end{bmatrix} = 0$

We can expand this out as:

$(f_{11}uu'+f_{12}vu'+f_{13}uv'+f_{21}uv'+f_{22}vv'+f_{23}v'+f_{31}u+f_{32}v+f_{33})=0$

We need to follow a similar approach as we did in part 1 by first fixing the scale and solving the regression. The least squares estimate of F is full rank, and the fundamental matrix has rank 2. So, we will need to reduce the rank. To do this, we can decompose F using singular value decomposition (SVD) into the matrices $U\sum V'=F$. We can then estimate a rank 2 matrix by setting the smallest singular value in $\sum$ equal to 0 and thus generating $\sum_2$. We can then calculate the fundamental matrix as $F=U\sum_2V'$.

Implementation: We will now implement estimate_fundamental_matrix() as follows:

Parameters: Points_a is an nx2 matrix of 2D coordinate of points on Image A and Points_b is an nx2 matrix of 2D coordinates of points on Image B.
Setup Variables: a_rows will be the number of rows in Points_a, b_rows will be the number of rows in Points_b, and vec will be a vector with one row and three columns filled with ones.
new_a & new_b: We will horizontally concact a row of ones to Points_A and Points_B. We can do so using matlab's built in function horzcat().
Horzcat() again: We will now horizontally concact three new_a vectors together. This will be the vector used for elementwise multiplication in the next step.
Calculate M: We can perform the Kronecker Tensor Product between new_b and vec using matlab's built in kron() function. We will then take this vector and perform elementwise multiplication with the vector in the previous step.
SVD: We will perform SVD on vector M now.
F: We now need F. What we can do is take the last column in V from he SVD and reshape V(:,last_col) into a 3x3 matrix. We can then take the transpose of this to get our F vector.
Zero: We need to have F(3,3) = 0. So, we will add a check to see if it is infact zero, if it is not then we will make F(3,3) = 0.
F_matrix: To get our F_matrix we need to multiply $F_U*F_D*F_V'$ where we got $F_U,F_D,F_V$ from the SVD of F step and where $F_V'$ is the transpose of $F_V$.

The code is as follows:


function [ F_matrix ] = estimate_fundamental_matrix(Points_a,Points_b)
a_rows = size(Points_a,1); 
b_rows = size(Points_b,1);
vec = [1 1 1]; 
new_a = horzcat(Points_a, ones(a_rows,1)); %Add a column of ones 
new_b = horzcat(Points_b, ones(b_rows,1)); %Add a column of ones
hc = horzcat(new_a,new_a,new_a); 
M = kron(new_b,vec).*hc              
[U,S,V]=svd(M);
last_col = size(V,2); 
F=transpose(reshape(V(:,last_col),3,3));
[F_U,F_D,F_V]=svd(F);
if F_D(3,3) ~= 0
    F_D(3,3) = 0;
end
F_matrix = F_U*F_D*transpose(F_V);

Let us now examine the results:

Results:

After running proj3_part2.m with our estimate_fundamental_matrix() function we arrive at an F_matrix of:

$$\begin{bmatrix} -0.0000 & 0.0000 & -0.0019 \\ 0.0000 & 0.0000 & 0.0172 \\ -0.0009 & -0.264 & 0.9995 \end{bmatrix}$$

The epipolar lines for the corresponding two images are as follows:

Left Image	Right Image

We see that we have very straight epipolar lines and thus have a good fundamental matrix estimation. Let us now move onto the final part of our project.

Part 3: Fundamental Matrix with RANSAC

Goal: For two photographs of a scene, it is unlikely that we would have perfect point correspondence with which to do the regression for the fundamental matrix. So, we will compute the fundamental matrix with unreliable point correspondeces computed with SIFT. Least squares regression is not appropriate here due to the presence of multiple outliers. We can use RANSAC in conjunction with our fundamental matrix estimation in order to estimate the fundamental matrix from this noisy data.

RANSAC: Random Sample Consensus (RANSAC) is an interative method to estimate parameters of a mathematical model from a set of observed data that contains outliers, when outliers are to be accorded no influence on the values of the estimates. We can think of ransac as an outlier detection method. Let us take a look at a simple example: fitting a line.

Let us look at the fitting of a line in two dimensions from a set of observation points. This set contains inliers and outliers. If we use a simple least squares method for line fitting, we will get a line that has a bad fit to the inliers. This line is poor because it is optimally fitted to all points, including the outliers. If we fit the data points to RANSAC we will get the following:

The RANSAC randomly samples the observed data. A voting scheme is used to find the optimal result. Elements in the data set vote for one or multiple models. The two steps of the algorithm are as follows:

Select a random subset of the data, called the hypothetical inliers.
Fit a model to the hypothetical inliers.
Test all other data against the model. If the points fir the model well they are part of the consensus set.
The model is good if many points have been classified as part of the consensus set.
Repeat the above.

An animation of the process can be found below:

Now that we understand the algorithm, let us proceed to our implementation of it in our matlab function ransac_fundamental_matrix():

Implementation:

Parameters: matches_a and matches_b are the Nx2 coordinates of the possibly matching points from pic_a and pic_b
Setup Variables: iterations is the max number of iterations we want to run, rows is the number of rows in matches_a, err_threshold is the threshold for our inliers, and best is a variable that we will use to track the Best F_matrix.
For Loop: We iterate over the number of specified maximum iterations and create an update variable which we will compare to the best variable, randomly sample 8 indices from rows, create rand_a and rand_b which are the indices from the randomly sampled rows, call estimate_fundamental_matrix with rand_a and rand_b, and then set two empty vectors, one for the a_inliers and one for the b_inliers.
Nested For Loop: We iterate over the number of rows and let or error be err = [matches_a(j,:) 1]*transpose(F_matrix)*transpose([matches_b(j,:),1]). We then take the absolute value of the error. If our absolute value of the error is less than the threshold error then we add matches_a(j,:) to a_inliers and we add matches_b(j,:) to b_inliers. We then increment our update variable.
Best_Fmatrix: We now check if our update variable is greater than the best variable. If it is then we set best = update and Best_Fmatrix = F_matrix (which we got from calling estimate_fundamental_matrix). Finally, we set inliers_a to a_inliers and inliers_b to b_inliers.
Take 30: We check if we have more than 30 rows in inliers_a. If we do we will randomly take 30 inliers from a random sample of indices from inliers_a and inliers_b.

The code for the above is as follows:


function [ Best_Fmatrix, inliers_a, inliers_b] = ransac_fundamental_matrix(matches_a, matches_b)
iterations = 3500;
rows = size(matches_a,1);
err_threshold = 0.005
best = 0;
for i=1:iterations
    update = 0;
    rand_index = randsample(rows,8);
    rand_a = matches_a(rand_index,:); 
    rand_b = matches_b(rand_index,:); 
    F_matrix = estimate_fundamental_matrix(rand_a,rand_b); 
    a_inliers = []; 
    b_inliers = [];
    for j=1:rows 
        err = [matches_a(j,:) 1]*transpose(F_matrix)*transpose([matches_b(j,:),1]);
        abs_error = abs(err); 
        if abs_error < err_threshold 
            a_inliers(end+1,:) = matches_a(j,:); 
            b_inliers(end+1,:) = matches_b(j,:); 
            update = update+1; 
        end
    end 
    if update > best
        best = update; 
        Best_Fmatrix = F_matrix;
        inliers_a = a_inliers; 
        inliers_b = b_inliers; 
    end 
end 
inlier_a_rows = size(inliers_a,1); 
if inlier_a_rows > 30 
    rand_index = randsample(inlier_a_rows,30); 
    inliers_a = inliers_a(rand_index,:); 
    inliers_b = inliers_b(rand_index,:); 
end

Let us now take a look at our results.

Results: The results are as follows:

Let us take a look at the Mount Rushmore image with 5000 iterations and an inlier threshold of 0.005:

Mount Rushmore

Left Image	Right Image

Matches

Let us now take a look at the same image with 5000 iterations and an inlier threshold of 0.0055:

Mount Rushmore

Left Image	Right Image

Matches

Let us now take a look at the same image with 5000 iterations and an inlier threshold of 0.01:

Mount Rushmore

Left Image	Right Image

Matches

Let us now take a look at the same image with 500 iterations and an inlier threshold of 0.01:

Mount Rushmore

Left Image	Right Image

Matches

Let us now take a look at the same image with 50 iterations and an inlier threshold of 0.01:

Mount Rushmore

Left Image	Right Image

Matches

Let us now take a look at the same image with 50 iterations and an inlier threshold of 0.0055:

Mount Rushmore

Left Image	Right Image

Matches

Let us now draw some conclusions about the above results:

We see that our results are better when we use more iterations and a lower threshold for our inliers. This is to be expected. However, we do get less matches and less inliers the stricter we are with our threshold. I found that using a threshold of 0.005 and 5000 iterations was great because we got very straight lines through the center of our dots and a nice amount of accurate matches. The accuracy was very good as seen in the images with 30 randomly sampled pairs. The number of iterations is important but I would rather have less iterations and a higher threshold for the inliers if I had to choose between the two. Running many trials with different combinations of iterations and inlier thresholds, the inlier threshold seemed to make the most difference in the accuracy.

Let us take a look at some of the other images:

Let us now take a look at the Notre Dame image with 5000 iterations and an inlier threshold of 0.0055:

Notre Dame

Left Image	Right Image

Matches

Let us now take a look at the Notre Dame image with 5000 iterations and an inlier threshold of 0.01:

Notre Dame

Left Image	Right Image

Matches

We see that for the Notre Dame, we get nearly perfect accuracy. Our lines are very straight and from our sample of matches, we see that a majority of them are correct.

Let us now take a look at the Episcopal Gaudi image with 5000 iterations and an inlier threshold of 0.01:

Episcopal Gaudi

Left Image	Right Image

Matches

Let us now take a look at the Episcopal Gaudi image with 5000 iterations and an inlier threshold of 0.005:

Episcopal Gaudi

Left Image	Right Image

Matches

We see that the Episcopal Gaudi image does not perform as well. There are a few blue lines that do not pass through the centers of the circles and a few of the matching lines look more crooked. However, the performance is not bad and is definitely better than the performance in project 2 for the Episcopal Gaudi image.

Let us now take a look at our own image, St Basil's Cathedral:

We will use 5000 iterations and an inlier threshold of 0.005.

We selected 20 inlier points and the result is the following:

St Basil's Cathedral

Left Image	Right Image

Matches

We see that there are a few points in the left and right images that do not exactly fall on the line. We can look at the matches and verify this. There are a few crooked lines. Here we do not have amazing matching.

Let us take a look at another image of a basketball sneaker:

LeBron Shoe

Left Image	Right Image

Matches

Here we see that the features are matched much better and that the inlier points are directly through the center of the blue lines. Our matching accuracy was very good for this image.

We note that the matching was great for Notre Dame, Mount Rushmore, and the LeBron sneaker. Let us implement some extra credit to see if we can perform better with the Episcopal Gaudi image.

Extra Credit:

Goal: Our estimate of the fundamental matrix can be improved by normalizing the coordinates before computing the fundamental matrix. We can perform normalization through linear transformations, discussed below, to make the mean of the points zero and the average magnitude 1.0 or some other smaller number such as the square root of two. The nromalization through linear transformations is described as follows:

$$\begin{bmatrix} u' & v' & 1 \end{bmatrix} = \begin{bmatrix} s & 0 & 0 \\ 0 & s & 0 \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 & -c_u \\ 0 & 1 & -c_v \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}$$

The transform matrix T is the produce of the scale and ofset matrices. $c_u$ and $c_v$ are the mean coordinated. To compute a scale $s$ we can estimate the standard deviation after subtracting the means. Then the scale factor $s$ would be the reciprocal of whatever the estimate of the scale we are using. Or, we could find the maximum absolute value. Or, we could scale the coordinates such that the average suqared distance from the origin, after centering, is two. We will need to use these scaling matrices, however we chose, to adjust the fundamental matrix so that it can operate on the original pixel coordinates as follows:

$F_{\text{orig}} = T_b^T*F_\text{norm} *T_a$

Let us proceed to implement this in the function estimate_fundamental_matrix():

Implementation: Let us now implement the extra credit portion of the project:

Parameters: Points_a and Points_b are nx2 matrices of 2D coordinate points from images A and B respectively.
$c_u,c_v$: We first need to calculate the $c_u$ and $c_v$ from Points_a and Points_b. These are the mean points, so we will use matlab's built in function mean() on our two vectors. Column 1 will be $c_u$ and column 2 will be $c_v$. We will label our mean coordinates as a_cu, a_cv, b_cu, and b_cv.
s: We need our scale. There are a wide variety of scales we can choose from but we will take the column 1 from Points_a and Points_b and subtract the respective c_u. We will then square it. Then we will add column 2 from Points_a and Points_b minus the respective c_v and square it. We will then sum this and take the square root. Finally we will take the number of rows divided by this quantity.
T: We now will calculate Ta and Tb. Ta can be calculated as [s 0 0; 0 s 0; 0 0 1]*[1 0 -a_cu; 0 1 -a_cv; 0 0 1] and Tb can be calculated as [s 0 0; 0 s 0; 0 0 1]*[1 0 -b_cu; 0 1 -b_cv; 0 0 1].
Calculate New Points: We now need to reset Points_a and Points_b. We will take a vector of [Points_i ones(size(Points_i,a),1)] then transpose this vector, then multiply it by T_i then transpose this quantity. Note that i = a,b.
For Loop: We now index through the number of rows in a and calculate u,u',v,v' and set our matrix entries to the expanded form of the matrix equation from part 2.
Reshape and SVD: We now reshape our matrix to a 3x3 after we divide it by a vector of 1's then multiply this by -1. Now we multiply that by the transpose of Ta and by Tb. We then transpose this quantity then perform singular value decomposition (SVD) on it. From this we will get our U,S, and V.
S(3,3)=0: We check if S(3,3) = 0. If it does not we set it to 0.
F_matrix: We now compute the F_matrix by multiplying U,S, and transpose(V) together.

The code is as follows:


function [ F_matrix ] = estimate_fundamental_matrix(Points_a,Points_b)
a_rows = size(Points_a,1); 
b_rows = size(Points_b,1);

a_cu = mean(Points_a(:,1)); 
a_cv = mean(Points_a(:,2));
b_cu = mean(Points_b(:,1)); 
b_cv = mean(Points_b(:,2));

s = a_rows/sum(((Points_a(:,1)-a_cu).^2 + (Points_a(:,2)-a_cv).^2).^(1/2));
Ta_pre = [s 0 0; 0 s 0; 0 0 1];
Ta_o = [1 0 -a_cu; 0 1 -a_cv; 0 0 1];
Ta = Ta_pre*Ta_o; 
s = a_rows/sum(((Points_b(:,1)-b_cu).^2 + (Points_b(:,2)-b_cv).^2).^(1/2));
Tb_pre = [s 0 0; 0 s 0; 0 0 1];
Tb_o = [1 0 -b_cu; 0 1 -b_cv; 0 0 1];
Tb = Tb_pre*Tb_o;
Points_a = transpose(Ta*transpose([Points_a ones(size(Points_a,1),1)])); 
Points_b = transpose(Tb*transpose([Points_b ones(size(Points_b,1),1)])); 

mtx = [];
for i=1:a_rows 
    u = Points_a(i,1);
    u_prime = Points_b(i,1); 
    v = Points_a(i,2); 
    v_prime = Points_b(i,2); 
    mtx(end+1,:) = [u*u_prime v*u_prime u_prime u*v_prime v*v_prime v_prime u v]; 
end

[U,S,V] = svd(transpose(transpose(Ta)*(reshape([-1*mtx\(ones(a_rows,1));1],[],3))*Tb)); 
if S(3,3) ~= 0
    S(3,3) = 0; 
end 
F_matrix = U*S*transpose(V); 
end

Now that we have implemented our new estimate_fundamental_matrix() function, let us run proj3_part3.m with the Episcopal Gaudi image with 5000 iterations and an inlier threshold of 0.005. The results are as follows:

Results:

Episcopal Gaudi

Left Image	Right Image

Matches

The Episcopal Gaudi images from before the extra credit normalization, with the same number of iterations and inlier threshold can be found below with 30 inliers shown:

Episcopal Gaudi

Left Image	Right Image

Matches

We see that our new estimate_fundamental_matrix() function with the extra credit normalization is much better. The matching lines are much straighter and the dots are now centered on the blue lines in the left and right images. Through this normalization, we have achieved much higher accuracy. We note that the Episcopal Gaudi image is now almost perfect.

Let us now look at the Notre Dame Image with 5000 iterations and an inlier threshold of 0.005:

Notre Dame

Left Image	Right Image

Matches

Here the results are stellar. The matches are very straight and map to the correct places. Also in our left and right images, the dots are perfectly centerd on the blue lines. We see that we have done a very good job with our Notre Dame image using the normalization.

Using the normalization in the extra credit has helped us greatly. We have seen improvement on the Notre Dame image and have seen a vast improvement in the Episcopal Gaudi image.

Conclusion:

Throughout the project we calculated the camera projection matrix using both a homogenous method and a non-homogenous method. Here we only had to choose one method, but I thought it would be nice to display both. We showed the results for both methods and how they are very similar. We then computed the camera center and displayed its results. Next we estimated the fundamental matrix without normalization and showed some results for a few images. After, we estimated the fundamental matrix with RANSAC for a variety of images, some from the original dataset and some of my own images. Finally, we completed the extra credit and normalized the coordinates before computing the fundamental matrix. In conclusion, we ended up getting some great matches and nice epipolar lines. In addition, I added a nice javascript function so that you can zoom in through hover and zoom out when the mouse is moved away from the image. Also, I added lengthy mathematical explanations for each section so that one who reads this webpage has full information to come up with a solution of their own. Overall, this was a great project and I enjoyed seeing the improvements from project 2 to project 3.

References

Images

Wikipedia for all RANSAC images and GIF

Dr. James Hays for the images in the data directory

Wikipedia for the St. Basil Cathedral images

Kicksonfire for the Lebron Shoe image

Content

Wikipedia for content on RANSAC

Dr. James Hays project description page and powerpoints

Adam Lieberman

Project 3 / Camera Calibration and Fundamental Matrix Estimation with RANSAC

Table of Contents:

Motivation:

Procedure:

Part 1: Camera Projection Matrx

Part 1.1: Camera Center

Part 2: Fundamental Matrix Estimation

Part 3: Fundamental Matrix with RANSAC

Extra Credit:

Conclusion:

References