Project 3 / Camera Calibration and Fundamental Matrix Estimation with RANSAC

Projection Matrix and Camera Center

Given a set of known coordinates in 3D space and a corresponding image, it is possible to use least-squares regression to calculate the projection matrix which maps 3D coordinates to 2D coordinates.

For the pairs of ground truth points given in part 1, the projection matrix M was calculated to be:

 0.7679   -0.4938   -0.0234    0.0067
-0.0852   -0.0915   -0.9065   -0.0878
 0.1827    0.2988   -0.0742    1.0000
  
This projection matrix results in a residual error of 0.0445. Using the projection matrix M, the estimated location of the camera was found to be <-1.5126, -2.3517, 0.2827 > as seen below.

The image on the left shows the projected center of the camera at <-1.5126, -2.3517, 0.2827 >. The image on the right shows the coordinates of 3D points projected onto the 2D image plane compared to the ground truth 2D points.

Fundamental Matrix Estimation

The fundamental matrix F relates corresponding points in a 2D image. Fx creates a line on which x' must lie, where x is a point in the left image, and x' is a point from the right image. The fundamental matrix can be estimated using a least-squares regression between a set of known corresponding points followed by reducing F from rank-3 to rank-2 using singular-value decomposition.

For the images provided in part 2, the following fundamental matrix was found:

     0.0011313  -0.026916        5.4986
    -0.023046    0.0058008      -52.565
     2.6948      76.564         -2317.3
  
The corresponding points and the epipoles can be seen below:

RANSAC and Image Matching

The RANSAC algorithm was used to filter out bad SIFT matches by randomly trying different fundamental matrices and testing the number of "good" matches. Good matches were those that were defined to be within an inlier threshold of the estimated fundamental matrix, where the inlier metric was the absolute value of x'TFx. The boolean threshold equation is given by:

T(x, x') = |x'TFx| < 20
as x'TFx ideally produces 0.

A threshold of 20 was found to retain a greater number of inliers than lesser thresholds and only include a small number of bad matches as well.

The RANSAC implementation took 8 random samples at each interation to minimize noise introduced into the estimation of F. 20,000 iterations were used to provide an adequate confidence in correctness.

Shown below are the correspondences filtered with RANSAC. The images of Mount Rushmore performed the best under this RANSAC implementation, while Notre Dame and Episcopal Gaudi peformed visibly worse

Images