Computer Vision Project

Project 3 / Camera Calibration and Fundamental Matrix Estimation with RANSAC

Overview

This project demonstrates how to

(1) Estimate camera calibration based on the project matrix
(2) Estimate the fundamental matrix from a set of 2d correspondences.
(3) Estimate the fundamental matrix with unreliable SIFT matches using RANSAC.

We will explore each part and evaluate the performance of the algorithms used.

1. Camera projection matrix

The camera projection matrix contains the camera parameters such as the focal point or the camera center. This matrix satisfies the equation:

To solve for this, we rewrite the equation as:

Once we have the matrix A set up, we can easily solve for x using SVD (singular value decomposition).

After we've got the camera projection matrix, solving for the camera center is child's play as it can be directly computed by: Center = -inv(M(:,1:3))*M(:,4)

Performance: Fast and accurate (the results closely match the provided values)

The code was written with no loops to take advantages of MATLAB's efficiency in matrix operations

2. Fundamental matrix estimation

Suppose we're given a set of correct correspondences between two images, how can we find the fundamental matrix that transforms one point to another?

To do this, we first write down the equation:

Next, we solve for F using SVD again.

Now, because the fundamental matrix is a rank 2 matrix. We need to reduce the rank of F. To do this, we decompose F into U(Sigma)V, clear out the smallest singular value in Sigma (setting it to zero) and multiply them back together for the rank-2 matrix F.

Doing so we will find the fundamental matrix.

Performance: Fast and accurate (all the epipolar lines cross (or almost cross) through the corresponding points.

There's not much to improve here. The code was very efficient as it is right now.

3. Find fundamental matrix with unreliable matches

In part 2, we were able to find the fundamental matrix with ease because we were given an accurate set of matching points. But we soon find out that this doesn't happen often in the real world. Usually we have to find the correspondences using some kind of algorithm like SIFT, but they're not 100% accurate. However, they are good enough that we can use geometric constraints to figure out the set of correct correspondences, or what we call inliers. Before determining what is inlier or outlier, however, we must first calculate our fundamental matrix. Because there may be outliers in the mix, we use a trial-and-error process called RANSAC to find out which fundmental matrix gives us the most number of inliers.

The steps of RANSAC is straightforward:

(1) Sample n pairs from the set of matches to fit a model
(2) Calculate the model
(3) Count the number of inliers, record the best model, rinse and repeat

Note that the algorithm uses 3 important parameters to define (1) when we should stop and (2) what we considered to be inliers. These parameters are placed at the top of the code:

confidence and max_number_of_iterations: we don't want the function to run forever because it's very unlikely that we can attain 100% accuracy right away. Therefore, we force the function to stop when either we've got a good number of inliers (confidence=number_of_inliers/total_matches) or when we've looped for a certain amount of times, whichever happens first. Currently set to 75% confidence and 2000 iterations.
distance threshold: this defines how a pair of possibly matching points is considered to be inliers. Currently set to 0.1. We don't want this to be too big (it may report wrong correspondences if the points just happen to be close by) or too small (we may wrongly assume the matches to be outliers due to failing to account for errors).

Performance: Near-perfect accuracy for the first two image pairs (Mount Rushmore and Notre Dame)

The algorithm doesn't perform so well against the remaining two pairs. The fundamental matrix can be calculated incorrectly due to the following reasons: