Project 3: Camera Calibration and Fundamental Matrix Estimation with RANSAC

The goal of this project is to estimate the camera projection matrix, which maps 3D world coordinates to image coordinates, as well as the fundamental matrix, which relates points in one scene to epipolar lines in another.

Part I: Camera Projection Matrix

To solve for the projection matrix, the following equations using the corresponding 2D and 3D points are set up:

where (X, Y, Z) are 3D points and (u, v) are 2D points. With the equations, the camera projection matrix can be solved using least squares with the '\' operator. The result of projection matrix is:

$$\huge \left[ \begin{matrix} 0.7679 & -0.4938 & -0.0234 & 0.0067 \\ -0.0852 & -0.0915 & -0.9065 & -0.0878 \\ 0.1827 & 0.2988 & -0.0742 & 1.0000 \end{matrix} \right]$$

The total residual between the projected 2d location of each 3d point and the actual location of that point in the 2d image is 0.0445. It is very small, which means that the result is quite accurate.

With the projection matrix M, the camera center in world coordinates can be computed easily. Let us define M as being made up of a 3x3 we'll call Q and a 4th column will call m_4 :

$$\huge M = (Q|m_4)$$

The center of the camera C could be found by:

$$\huge C = -Q^{-1}m_4$$

The estimated location of camera is:

$$\huge C_{normA} = (-1.5126, -2.3517, 0.2827)$$

The followings are the result images:

Part II: Fundamental Matrix Estimation

In this part, we use 8-point algorithm to solve the fundamental matrix. The definition of the Fundamental Matrix is:

$$\huge \left( \begin{matrix} u' & v' & 1 \end{matrix} \right) \left( \begin{matrix} f_{11} & f_{12} & f_{13} \\ f_{21} & f_{22} & f_{23} \\ f_{31} & f_{32} & f_{33} \end{matrix} \right) \left( \begin{matrix} u \\ v \\ 1 \end{matrix} \right) = 0$$

Which is the same as:

(u, v) and (u', v') are points in two images taken from cameras with different intrinsic and extrinsic parameters.

With the equations, the fundamental matrix can be solved using least squares with SVD.


[U, S, V] = svd(A);
F_matrix = V(:,end);
F_matrix = reshape(F_matrix, [3 3])';

The least squares estimate of F is full rank; however, the fundamental matrix is a rank 2 matrix. In order to reduce rank, we can resolve det(F) = 0 constraint using SVD again:


[U, S, V] = svd(F_matrix);
S(3,3) = 0;
F_matrix= U*S*V';

The result of the fundamental matrix is:

$$\huge \left[ \begin{matrix} -0.0000 & 0.0000 & -0.0019 \\ 0.0000 & 0.0000 & 0.0172 \\ -0.0009 & -0.0264 & 0.9995 \end{matrix} \right]$$

The followings are the Epipolar Lines for this fundamental matrix:

Extra Credit: Normalizing the Coordinates

The result of the fundamental matrix can be improved by normalizing the coordinates before computing the fundamental matrix. In my algorithm, the images need to meet the following conditions after being normalized:

  1. The mean of the points is zero.
  2. All coordinates are bounded to be +/- sqrt(2).

The transform matrix T is the product of the scale and offset matrices:

$$\huge \left( \begin{matrix} u' & v' & 1 \end{matrix} \right) = T * \left( \begin{matrix} u \\ v \\ 1 \end{matrix} \right) = \left( \begin{matrix} s & 0 & 0 \\ 0 & s & 0 \\ 0 & 0 & 1 \end{matrix} \right) \left( \begin{matrix} 1 & 0 & -c_u \\ 0 & 1 & -c_v \\ 0 & 0 & 1 \end{matrix} \right) \left( \begin{matrix} u \\ v \\ 1 \end{matrix} \right)$$

c_u and c_v are the mean coordinates, which is easy to calculate.

To compute scale s in the matrix, I first estimate the standard deviation after substracting the means. After that, s can be calculated as sqrt(2) divided by the standard deviation.

With c_u, c_v and s, T can be computed with the formula mentioned before.

After that, all points in two images can be transformed by their respective transform matrix. Implement 8-point algorithm using these transformed points can create a new fundamental matrix. As the coordinates are scaled, the new fundamental matrix needs to be adjusted as follows so that it can operate on the original pixel coordiantes:

$$\huge F_{orig} = T_b^T * F_{norm} * T_a$$

The new result of the fundamental matrix is:

$$\huge \left[ \begin{matrix} -0.0000 & 0.0000 & -0.0004 \\ 0.0000 & -0.0000 & 0.0032 \\ -0.0000 & -0.0044 & 0.1035 \end{matrix} \right]$$

The improvement of fundamental matrix is not so obviously reflected on the result images of part 2, but it is really helpful when computing the results of some images in part 3. The followings are the results pictures of the improved fundamental matrix. Some noises are added on the test pictures.

Part III: Fundamental Matrix with RANSAC

In this part, the fundamental matrix is calculated with unreliable point correspondences computed with SIFT. VLFeat is used to do SIFT matching for any given image pair.

With these putative point correspondences found by SIFT, we can use RANSAC to find the "best" fundamental matrix. The steps of the algorithm are listed as follows:

  1. Ramdomly choose some number (in most cases to be 8) of point correspondences from the given image pair.
  2. Use the function estimate_fundamental_matrix() in part 2 to compute a fundamental matrix with these points.
  3. Traverse point correspondences. For each pair of points, if u'Fu is smaller than threshold, then this is a pair of inlier points.
  4. Count the number of inliers for this fundamental matrix. The best fundamental matrix is the matrix with the biggest number of inliers.

For some "easy" pairs, we have very high chance to find a good fundamental matrix. Therefore, when the number of inliers is equal to the total number of point correspondences, we can break from the iteration to save time.

Part 3 Results

In my program, I ramdomly pick 8 points from point correspondences and set the threshold as 0.005.

The followings are results for the Notre Dame pair. The indices of picked points are (701 819 226 625 377 113 464 763) and the best fundamental matrx is:

$$\huge \left[ \begin{matrix} 0.0000 & 0.0000 & -0.0024 \\ -0.0000 & 0.0000 & 0.0024 \\ 0.0019 & -0.0034 & 0.3011 \end{matrix} \right]$$

This fundamental matrix is very good. Almost all the points are matched correctly:

For Gaudi pair, if the coordinates are not normalized before computing the fundamental matrix, many wrong pair of points will be included in results, as the following images shown:

However, if the coordinates are normalized, the results are much better, as the following images shown:

The results of Mount Rushmore pair are shown as follows:

The results of the photos pair are shown as follows: