Project 3:Camera Calibration and Fundamental Matrix Estimation with RANSAC

Ian Buckley

October 10, 2016

1 Introduction

The objective of this project was to improve upon image matching by leveraging epipolar geometry of a stereo image pair while applying RANSAC to reject poorly matched feature points. After performing a simple camera calibration, SIFT feature points detected in an image pair were used to estimate the Fundamental Matrix; using the estimation of the Fundamental Matrix, RANSAC was used to match the image pair. The following sections describe the approach and highlight the results of the project.

2 Camera Calibration

Camera calibration consisted of two main tasks: determining the projection matrix and determining the camera center. The task was formulated as a system of linear equations, and two distinct methods were applied.

Determining the Projection Matrix The projection matrix can be written as a transformation from homogeneous image coordinated to homogeneous world coordinates of the form:

⌊ ⌋ ⌊ ⌋ X | x | || Y || ⌈ y ⌉ = M |⌈ Z |⌉ , 1 1

(1)

which can be rewritten in the form

Az = 0,

where A is a matrix with elements from x and X and z is a column vector composed of the concatenated rows of M. The projection matrix M was determined by fixing the last element of z = 1 to avoid trivial solutions. Figure 1 shows the results of applying the project matrix to align image coordinates with world coordinates; using calibration data of image coordinates to known world coordinates, it is evident from the figure that the projection matrix was correctly determined. A residual of 0.0445 is reported.

Figure 1: The calculated projection matrix is able to correctly align image coordinates with world coordinates.

Determining the Camera Center The projection from image to world coordinates is typically written as:

x = K [R |t]X.

(2)

With Q ∈^3×3 and m ∈^4×1, decomposing the matrix M = [Q|m] allows the camera center C to be calculated according to:

C = Q -1m.

For the calibration scenario, the camera location was determined to be C = [-1.5126,-2.3517,0.2827]. This location is correct according to the ground truth camera location. Figure 2 shows the camera location relative to the observed points.

Figure 2: The calculated camera location is reasonable and able to capture all of the points in the scene.

3 Fundamental Matrix Estimation

As with determining the projection matrix, the Fundamental Matrix was determined by formulating a system of linear equations. The Fundamental Matrix obeys the relationship:

′ xFx = 0.

This can be rewritten as Az = 0. Rather than solving the linear system, singular value decomposition was used to determine z, which was reshaped into the Fundamental Matrix F after enforcing the singularity constraint.

Figure 3: The Fundamental Matrix is used to draw the epipolar line in one image given a corresponding point in the other.

Normalization The estimation of the Fundamental Matrix was improved by performing the estimation on normalized coordinates. A transformation matrix for each image in the stereo image pair was determined by first subtracting the respective centroids of the feature points from their coordinates. The resulting coordinates were divided by the maximum magnitude of the coordinate pairs to put the coordinates in [0,1]. After performing the Fundamental Matrix estimation on the normalized coordinates as listed above, the transformation matrices of the image pair were used to transform the Fundamental Matrix back into the original coordinate frame according to:

Forig = TTb FnormTa.

Except when noted to demonstrate the improvement achieved through using the normalization in estimating the Fundamental Matrix, all images result from using normalization.

4 RANSAC

RANSAC was performed to reject incorrect matches between an image pair based on the Fundamental Matrix. 8 potential matches were chosen randomly to estimate the Fundamental Matrix. Using the relationship

′ x Fx = r

(3)

for an imperfect Fundamental Matrix, all possible matches were tested against the estimated Fundamental Matrix, and matches for which the relationship in 3 was less than a threshold value were added to a list of inliers. The estimation of the Fundamental Matrix was repeated, and the best Fundamental Matrix, evaluated based on increasing numbers of inliers, was kept.

Figure 4: Epipolar lines drawn using the estiamted Fundamental Matrix are drawn for the Mount Rushmore image pair.

Figure 4 shows the epipolar lines of the Mount Rushmore image pair, and Figure 5 shows feature point correctly matched between the image pair.

Figure 5: The match generated using the estimated Fundamental Matrix and RANSAC for Mount Rushmore image pair appears to correctly match feature points between them.

Figure 6: The match generated using the estimated Fundamental Matrix and RANSAC for Notre Dame image pair appears to correctly match feature points between them.

Figures 6 and Figure 7 demonstrate that the estimation of the Fundamental Matrix and feature point matching with RANSAC performed well on a number of image pairs.

Figure 7: The match generated using the estimated Fundamental Matrix and RANSAC for Campus Building image pair appears to correctly match feature points between them.

Figure 8: The match generated using coordinate normalization in estimating Fundamental Matrix and RANSAC for Gaudi image pair appears to correctly match feature points between them.

Figure 9: The match generated without using coordinate normalization in estimating Fundamental Matrix and RANSAC for Gaudi image pair demonstrates poor matching of feature points between them.

Figure 8 shows the match generated with normalized coordinates in the Fundamental Matrix estimation for the Gaudi image pair. To highlight the improvement resulting from coordinate normalization, the same image pair was matched without normalizing the coordinates in Figure 9, and the result is a visibly worse match between the images.

5 Conclusion

The objective of this project was to improve upon image matching by leveraging epipolar geometry of a stereo image pair while applying RANSAC to reject poorly matched feature points. Compared to the local feature matching performed in Project 2, the results of Project 3 are considerably more satisfying, yielding obvious matches for the image pairs. In the context of computer vision for robotics, it would be interesting to explore these topics in the context of SLAM, even more so for monocular SLAM.