Project 3 / Camera Calibration and Fundamental Matrix Estimation with RANSAC

This project includes three parts. In the first part, the camera projection matrix and camera location are calculated based on given feature points in the image and their corresponding 3D coordinates. In the second part, the fundamental matrix is derived based on matched points in two images. In the third part, the RANSAC algorithm is used to find the "best" fundamental matrix based on matched feature points using SIFT. With these three different but connected parts, we can obtain the knowledge in camera and scene geometry.

Camera Calibration

The projection matrix from world 3D coordinates to 2D image coordinate includes the camera intrinsic matrix and the extrinsic matrix. The intrinsic matrix takes the focal length, aspect ratio, optical center, and skew into account, and the extrinsic matrix considers the rotation and translation of the camera with respect to the world coordinate. The mapping relationship from the homogeneous world coordinates (X) to the homogeneous image coordinates (x) is given by

x=MX=K[R t]X

where M is the 3x4 projection matrix, K is the 3x3 intrinsic matrix, R is the 3x3 rotation matrix, and t is the 3x1 translation vector. With a series of known 2d image coordinates and 3d locations, M can be solved using the linear least square method. With the solved M, the camera center location (C) can be computed by:

C=-M(:,1:3)-1M(:,4)

Fundamental Matrix Estimation

The corresponding points in two images can be correlated by the means of the fundamental matrix, which is given by:

x'TFx=0

where x' and x are the homogeneous coordinates in the right and left images, respectively, and F is the 3x3 fundamental matrix. With eight or more pairs of feature points in the right and left images, F can be solved from the above equation using the least square approach. Since the least square estimate of F is full rank, the fundamental matrix should have rank of 2. Therefore, det(F) is forced to be zero by setting the smallest singular value as zero when using SVD decomposition to solve the least square solution.

In order to solve the problem of poor numerical condition, the coordinates of feature points in both images are rescaled. First, the weighted center of feature points is computed and all feature points are shifted to make the center at (0,0). Second, the coordinates of all feature points are scaled such that the mean squared distance to the center is 2 pixels. Third, Fnorm is computed by using the shifted and normalized point coordinates and det(Fnorm)=0 is applied. Finally, the original fundamental matrix (Forig=) can be computed as:

Forig=TbT*Fnorm*Ta

where Ta and Tb are the transformation matrices for the left and right images including shift and scale.

Refinement of Feature Matches with RANSAC

By using the SIFT algorithm, the matches of feature points of two related images can be potentially found. However, these matches are not reliable. Considering the epipolar constraint in two related images, these matches can be refined using the fundamental matrix. By using the RANSAC algorithm, a random set of feature matches are used to compute the fundamental matrix using the approach described in part 2. Then, all SIFT matches are checked by calculating the residual of x'TFx to count the number of inliers with a threshold. Thousands of iterations are operated until the most inliers are returned.

Results

In part 1, the estimated projection matrix is:

0.7679 -0.4938 -0.0234 0.0067
-0.0852 -0.0915 -0.9065 -0.0878
0.1827 0.2988 -0.0742 1.0000

The total residual is 0.0445 and the estimated location of the camera center is: <-1.5126, -2.3517, 0.2827>. The actual image feature points and the projected feature points from the 3d coordinates based on the projection matrix are shown in Fig. 1(a). The camera location with respect to all feature points in 3d world is shown in Fig. 2(b).

(a) (b)

Fig. 1 (a) Actual points and projected points based on the estimated projection matrix; (b) Camera location in the 3d world.

In part 2, the estimated fundamental matrix with feature point normalization is:

0.0000 0.0000 -0.0019
0.0000 0.0000 0.0172
-0.0009 -0.0264 0.9995

The estimated fundamental matrix without feature point normalization is:

0.0000 0.0000 -0.0004
0.0000 0.0000 0.0035
0.0000 -0.0048 0.1119

The epipolar lines with corresponding feature points in the left and right images with and without normalization are shown in Figs. 2(a) to (d). It shows that with coordinate normalization, the distance between each epipolar line and corresponding feature point is further decreased and almost each feature point is passed by an epipolar line.
(a) (b) (c) (d)

Fig. 2 (a) Feature points with epipolar lines on the left image without coordinate normalization; (b) Feature points with epipolar lines on the right image without coordinate normalization; (c) Feature points with epipolar lines on the left image with coordinate normalization; (d) Feature points with epipolar lines on the right image with coordinate normalization;

In part 3, several different images with the epipolar lines drawn on them and with the inlier keypoint correspondences are shown from Fig. 3 to Fig. 6. The coordinate normalization is performed in calculating the fundamental matrix during each RANSAC iteration. Note that much more iterations and a lower threshold are used in Figs. 5 and 6, because these two figures have worse feature matches computed from SIFT that Figs. 3 and 4.

(a)

(b) (c)

Fig. 3 (a) Refinement of feature point matches using RANSAC for Mount Rushmore (Threshold: 0.05, Iteration Number: 10000); (b) Epipolar lines with feature points in the left image for Mount Rushmore; (c) Epipolar lines with feature points in the right image for Mount Rushmore.

(a)

(b) (c)

Fig. 4 (a) Refinement of feature point matches using RANSAC for Notre Dame (Threshold: 0.05, Iteration Number: 10000); (b) Epipolar lines with feature points in the left image for Notre Dame; (c) Epipolar lines with feature points in the right image for Notre Dame.

(a)

(b) (c)

Fig. 5 (a) Refinement of feature point matches using RANSAC for Episcopal Gaudi (Threshold: 0.005, Iteration Number: 100000); (b) Epipolar lines with feature points in the left image for Episcopal Gaudi; (c) Epipolar lines with feature points in the right image for Episcopal Gaudi.

(a)

(b) (c)

Fig. 6 (a) Refinement of feature point matches using RANSAC for Woodruff Dorm (Threshold: 0.005, Iteration Number: 100000); (b) Epipolar lines with feature points in the left image for Woodruff Dorm; (c) Epipolar lines with feature points in the right image for Woodruff Dorm.