This project involved 3 major tasks:
The first part of the project required finding the projection matrix of a camera given some 3-D world coordinates and their corresponding homogenous image coordinates. The values of the matrix can be found by performing a linear regression with the equations resulting from this matrix multiplication.
In order to find the values in the matrix M, I had to set up a series of equations in MATLAB which took into account several image-world correspondences:
This required setting the value of m34 to 1 and solving the regression for the rest of the values. After multipling the matrix by a scalar to scale the matrix to the scale desired for the assignment, I got the following matrix:
The total residual (error) from the difference between projected 2D locations and actual 2D locations in the image was small: 0.0445.
The camera center in world coordinates was calculated by solving the following equation:
Where Q consists of the first three columns of the matrix and m4 is the last column of the matrix. The resulting camera center was:
The next part of the project involved implementing the algorithm used to determine the fundamental matrix instrumental in finding correspondences between two images. This was computed by solving the regression resulting from the fundamental matrix definition:
These equations were solved for every pair of corresponding points. After solving for the f values and making the matrix rank 2, the resulting fundamental matrix for a sample pair of images was:
The epipolar lines in each image computed from the fundamental matrix can be visualized as follows:
The last part of the project involved using the algorithm implemented in part 2 in coordination with the RANSAC model-fitting algorithm in order to find a good fundamental matrix for an image pair that filters out bad keypoint matches from good ones.
The algorithm used to find the best fundamental matrix was as follows:
Using a low threshold had the benefit of only yielding inliers with a very low error, but this also resulted in the elimination of many pairs which in fact were still rather good keypoint matches.
To find the best threshold, I used a Mount Rushmore image pair to perform trials for three thresholds: 0.002, 0.01, and 0.02. About 825 possible keypoint matches were found. I anticipated that the threshold that resulted in a matrix that found close to half of those correspondences to be inliers would be a good threshold to use.
Here are the results of the experimentation:
Threshold | 0.002 | 0.01 | 0.02 |
Trial 1 | 66 | 213 | 353 |
Trial 2 | 67 | 250 | 415 |
Trial 3 | 62 | 221 | 405 |
Trial 4 | 68 | 277 | 339 |
Trial 5 | 80 | 223 | 370 |
Trial 6 | 53 | 241 | 428 |
Trial 7 | 76 | 210 | 433 |
Trial 8 | 66 | 197 | 426 |
Trial 9 | 75 | 244 | 451 |
Trial 10 | 124 | 247 | 473 |
Average | 73.7 or about 74 | 232.3 or about 232 | 409.3 or about 409 |
Example results for each threshold, with 30 of the inliers being displayed:
Threshold | 0.002 | 0.01 | 0.02 |
Image |
Another reason why the fundamental matrix from a 0.002 threshold would not work is because the way the epipolar lines radiate from the epipoles in that scenario seems to suggest forward motion with cameras when that is clearly not the case.
From the results above, it is clear that a threshold of 0.02, on average, gives nearly half of the inliers of the Mount Rushmore image. Therefore, this was the threshold I went with.
Here are the resulting epipolar lines and correspondences for the fundamental matrices for many image pairs. Good epipolar lines were yielded for all pairs except for the Gaudi pair since the eight-point algorithm was not normalized.
Epipolar lines | Keypoint correspondences |