Project 3: Camera Calibration and Fundamental Matrix Estimation with RANSAC

Contents

  1. Camera Projection Matrix
  2. Fundamental Matrix Estimation
  3. Fundamental Matrix with RANSAC
  4. Extra credit/Graduate credit

1. Camera Projection Matrix

Camera projection matrix mps 2D world coordinates to image coordinates. To estimate the projection matrix, corresponding 3D and 2D points in one image need to be provided. From the homogeneous transformation equation,

we can derive a system of equations as follows, taking all the elements in the projection matrix as unknown variables.
As the projection matrix has only 11 degrees of freedom and 1 scaling variable, the last element in the homogeneous transformation is set to be 1. By using matlab A\b operation, we solve the system of linear equations in the least square sense. Reformulate the projection matrix into 3x4 matrix form. The result from the provided 3D and 2D points is,
The matrix is exactly a scaled equivalent of the correct one. Using the projection Matrix to map the 3D points back to 2D coordinate plane, we can compare the actual 2D points and projected 2D points in the following figure.
Fig. 1 Projected results
With the projection matrix, we can decompose it and find the camera center as,
We can also map the camera and all points in 3D using the provided function, along with the actual image as a side comparason.
Fig. 2 3D visulization of the points and camera
Fig. 3 Original image

2. Fundamental Matrix Estimation

The fundamental matrix relates points in one scene to epipolar lines in the other scene. To estimate the fundamental matrix, we need correspoinding 2d points across two images. The fundamental properity of the fundamental matrix is,
Expand the matrix multiplication into a scaler equation.
Similarly, we can transforma the problem into solving a system of linear equations with multiple corresponding 2D points.
With input correspoinding 2D points Points_a and Points_b, we can compose the matrix A as LinEquMatrix by the following.

LinEquMatrix = zeros(numPoints,9);
LinEquMatrix(:,1) = Points_a(:,1).*Points_b(:,1);
LinEquMatrix(:,2) = Points_a(:,2).*Points_b(:,1);
LinEquMatrix(:,3) = Points_b(:,1);
LinEquMatrix(:,4) = Points_a(:,1).*Points_b(:,2);
LinEquMatrix(:,5) = Points_a(:,2).*Points_b(:,2);
LinEquMatrix(:,6) = Points_b(:,2);
LinEquMatrix(:,7:8) = Points_a;
LinEquMatrix(:,9) = ones(numPoints,1);
Decompose the matrix A by SVD decomposition. The fundamtrix matrix F_matrix is the last column vector of the unitary matrix. As the fundamental matrix is of rank 2, we need to set the smallest singular value in its SVD decomposition to 0.

[U,S,V]= svd(LinEquMatrix);
Fvector = V(:,end);
F_matrix = reshape(Fvector,[3,3])';
[U,S,V] = svd(F_matrix);
S(3,3) = 0;
F_matrix = U*S*V';
Based on the estimated fundamental matrix, we can draw epipolar lines on the each image. The epipolar lines are going to cross the correspoinding points.

Fig. 4 Epipolar Lines on the image pair
Because there are numerical errors when solving the system of linear equations, the epipolar lines are not perfectly crossing the correspoinding points. Using normallization on all the data points (Extra credit part), we can achieve perfect alignment on the points by reducing the numerical error on the fundamental matrix, as shown in Fig. 5. The fundamental matrix is

Fig. 5 Epipolar Lines on the image pair using Normalization on points

3. Fundamental Matrix with RANSAC

Fundamental matrix can also be used for matching features from multiple scenes. Often times perfect point corresponence from differnt scenes does not exist. To leveage the unreliable point correspondences, RANSAC can be applied to estimate the fundamental matrix from noisy point pairs. SIFT feature detector is used to find interest point pairs. VLFeat toolbox is used to do SIFT matching. However VLFeat 0.9.20 binary package has compactiablity issue in Win 10 environment. VLFeat 0.9.19 is finally installed at last.

8 or more point pairs (parameter from numChosen) is iteratively randomly chosen as the input for calculating the fundamental matrix. Then inliers are found for this fundamental matrix based on the distance measurement. If a point pair satisfies XFX'< Threshold , then it is count as an inlier. Choose the fundamental matrix that provides the most inliers from a certain number of iterations. Then with experimented values of the thresold and maximum number of iterations, we can find good fundamental matrix. There are mainly 3 parameters, namely numChosen, Threshold and numItr. Different parameters have different effects on the results. As the matched features from VLFeat are not all right, randomly chosen numChosen pairs from them can easily give us bad pairs for calculating the fundamental matrix. Thus we would choose less numChosen pairs to calculate the fundamental matrix and run large number of iterations to find the best fundamental matrix calculation with maximum inliers. Thus for the following image pairs, numChosen is generally 8 and numItr is over 2000. The thresold value is chosen so that the maximum number of inliers are generally 1/4 of all the input matched features. It should not be too small or too large. However, with carefully chosen parameters, the fundamental matrix found from RANSAC does not necessarily be perfect because the points are coplanar and the cameras are not actually pinhole cameras and have lens distortions. But after testing several image pairs, it is safe to say that the fundamental matrix from RANSAC could help remove incorrect point matches.

Fig. 6a Mount Rushmore Matched features with numChosen = 8, Threshold = 6e-4; NumIteration = 2k

Fig. 6b Epipolar lines on Mount Rushmore pair from the best fundamental matrix

Fig. 6c Epipolar lines on Mount Rushmore pair from the re-estimated fundamental matrix
Fig. 7a Matched features with numChosen = 8, Threshold = 6e-4; NumIteration = 2k

Fig. 7b Epipolar lines on Notre Dame pair from the best fundamental matrix

Fig. 7c Epipolar lines on Notre Dame pair from the re-estimated fundamental matrix
Fig. 8a Gaudi Matched features with numChosen = 8, Threshold = 8e-4; NumIteration = 20k

Fig. 8b Epipolar lines on Gaudi pair from the best fundamental matrix

Fig. 8c Epipolar lines on Gaudi pair from the re-estimated fundamental matrix

4. Extra credit/Graduate credit

As images from different scenes can be shifted and scaled, coordinate normalization improve the estimation of fundamental matrix Scaling after linear transformation is implemented here. For each image, first all the points are shifted so that their mean is (0,0). After that, a scaling facter is calculated so that after scaling the average distance between each point and the center (0,0) is square root of 2.

numPoints = size(matches_a,1);
mean_a = sum(matches_a,1)/numPoints;
matches_a = matches_a - repmat(mean_a,[numPoints,1]);
scale_a = sqrt(sum(sum(matches_a.^2))/2/numPoints);
matches_a = matches_a/scale_a;
TranslationA = [1,0;0,1;0,0]';
TranslationA = [TranslationA;[-mean_a,1]]';
TransformA = diag([1/scale_a,1/scale_a,1])*TranslationA;

mean_b = sum(matches_b,1)/numPoints;
matches_b = matches_b - repmat(mean_b,[numPoints,1]);
scale_b = sqrt(sum(sum(matches_b.^2))/2/numPoints);
matches_b = matches_b/scale_b;
TranslationB = [1,0;0,1;0,0]';
TranslationB = [TranslationB;[-mean_b,1]]';
TransformB = diag([1/scale_b,1/scale_b,1])*TranslationB;
The scaling after translation defines a transformation matrix for each image, which also needs to be calculated to adjust the fundamental matrix calculated from the normalized coordinates. With this adjustment, the fundamental matrix can operate on the original coordinates.

F_matrix = transpose(TransformB)*F_matrix*TransformA;
The normalization part is implemented in normalization.m, taking original point coordinates and returning the normalized coordinates with the transformations.