The goal of this assignment was to gain an understanding of camera and scene geometry. In particular, I was asked to complete 3 tasks:
1. Estimate the camera projection matrix (K[R|T]) given a set of 3D to 2D point correspondances. Using this estimated matrix, I was then asked to also compute the camera center.
2. Estimate the fundamental matrix given a set of corresponding 2D points from 2 images
3. Given a set of possibly corresponding points from two images (the result of SIFT matching), implement RANSAC to help remove outlier matches. This RANSAC implementation is expected to leverage the fundamental matrix estimation code from the previous task.
My report is structured as follows: Each task will have its own dedicated sub-section in which I describe the methodology used to complete the task and show the corresponding results. Note that in the sub-section corresponding to task 2, I will describe the coordinate normalization bells & whistles work I completed. I will do a compare and contrasting of results (with and without bells & whistles) both in this sub-section, and in the sub-section corresponding to task 3 (since it relies on the code written for task 2).
Task 1: Estimate the Camera Projection Matrix (K[R|T]) and find Camera Center
Methodology
The Camera Projection Matrix (M) is the product of the intrinsic (K) matrix and the extrinsic ([R|T]) matrix. Intuitively, the matrix defines the transformation between 3D world coordinates and 2D image coordinates (where (su,sv,s) are homogenous image coordinates and (x,y,z,1) are homogenous world coordinates):
The steps to solve for M are as follows:
1. Gather at least 6 points pairs to create the following linear homogenous system (Note that we are given 20 points to use, so I include all 20 in the system):
2. Take the SVD of the leftmost matrix above (notice that the equation above takes the form of Ax=0. A solution for x can be gotten by taking the SVD of A)
3. In Matlab, SVD of A returns 3 matrices, U (Left signular vectors), S (Singular values), and V (Right singular vectors). A nx1 representation of the M matrix we are trying to solve for is found as the right-most column vector of the V matrix. So this step amounts to segmenting out that column, and reshaping it into the expected 3x4 size.
4. Now that M is solved for, the last thing to do is to solve for the camera center. If we view M as a 3x3 matrix Q appended to a 3x1 vector m4, we can easily get the camera center in one step:
Results
Using the pts2d-norm-pic_a.txt and pts3d-norm.txt, my camera projection matrix (M) is as follows:
Using the same datasets, my resultant camera center is as follows:
Using the estimated camera projection matrix to project the 3d points, I also compared my projection with the ground truth 2d image points. As the plot below suggests, my estimated camera projection matrix is quite accurate in this case:
Task 2: Estimate the fundamental matrix given a set of corresponding 2D points from 2 images
Epipolar geometry is the intrinsic geometry between views. The fundamental matrix (F) encapsulates this geometry. In particular, given two matching points (x and x') from the two views, the following property is satisfied:Additionally, F can be used to find the epipolar lines corresponding to each of the points x' and x in either image:
,   correspond to the epipolar lines associated with x' and x respectively.Methodology
One method to solve for F (the normalized 8-point method/the method I used), given 2 sets of corresponding points (P' and P), can be described as follows:
1. Gather at least 8 correspondances that will be needed to solve for F (Note that we are given 20 correspondances to use, so I use all 20)
2. Bells & Whistles Work - Once we have the two sets of points, we then 'normalize' both sets. This normalization consists of centering (de-meaning) each point at (0,0), and then rescaling the points such that the average squared distance from the origin is 2. Note that normalization is done separately on each set of points. This step is done to ensure good numerical conditioning. Essentially, we are going to end up taking the SVD of a matrix that contains these coordinate values, and we don't want some values in that matrix to be huge, and others to be very small (which might occur with raw coordinates).
3. Set up the following system of homogenous linear equations for F (where u',v',1 correspond to P' and u,v,1 correspond to P):
4. Similar to task 1, we again have a linear equation of the form Af = 0. So this step consists of solving for F using an SVD of A (the left-most matrix in the eq. above)
5. Enforce the singularity constraint on the F matrix. The Fundamental Matrix we currently have an estimate for is of full rank, but it is defined as a rank 2 matrix. To reduce its rank, we take the SVD of F, set it smallest singular value (in the S matrix returned from Matlab) to zero, and then reconstruct F
6. Bells & Whistles Work - Note that the Fundamental Matrix that we have estimated at this point corresponds to our 'normalized' coordinates. For this Fundamental Matrix to apply to original coordinates, I need to adjust it based on the transformations I used. The adjustment is defined as follows (where T_a corresponds to the transformation for the first set of points, and T_b corresponds to that of the second set of points):
Results
Fundamental Matrix Estimates For Base Image Pair
Without Normalization | With Normalization | Comments |
---|---|---|
Admittedly, not much can be concluded from just these numerical estimates alone. I just included them here for reference. |
Fundamental Matrix Epipolar Line Evaluation
Recall from earlier that the Fundamental matrix can be used to find the epipolar
linear corresponding to x' and x. We can plot these corresponding lines against the
points they are supposed to contain to see how well the Fundamental Matrix was
estimated. This is what we do below to compare not normalizing v. normalizing:
Normalization? | View 1 | View 2 | Comments |
---|---|---|---|
No |
The black circles are around instances where the epipolar lines noticeably miss the corresponding point. In general the lines are fairly close, but in some cases (particularly the 'top' black circle in both views), the line is very off. |
||
Yes |
I've drawn black circles around the same points in this pair of images as in the previous row's pair. Clearly, the epipolar lines are better aligned to the corresponding points after including normalization (i.e. the lines go straight through the circled points instead of slightly above or slightly below). |
Task 3: Fundamental Matrix With RANSAC
For this task we are asked to use RANSAC in conjunction with Fundamental Matrix estimation from task 2 to help remove outliers from a set of possible correspondances (identified by the SIFT pipeline). The overarching idea here is that if we can calculate a geometric relationship between the two views, we can then identify and 'toss out' potential matches that don't align with this relationship.
Methodology
In particular, the steps to complete this task (and implicitly the steps that outline the RANSAC methodology) can be described as follows:
1. Randomly sample 8 distinct correspondances from the list of possible matches (recall that the minimum number of points needed for our Fundamental Matrix estimation implementation was 8)
2. Estimate a Fundamental Matrix between the two views using these 8 correspondances
3. Recall that a property of the Fundamental Matrix is that given 2 matching points x' and x:
Thus, this step amounts to calculating/storing the above quantity for every correspondance we have using the Fundamental Matrix that was just estimated.
4. Look through the stored quantities in the previous step and segment out those that fall below a certain threshold (recall that the equation above suggests that this quantity should be 0 for a match, so I chose my threshold to be something very small (0.001) in my final implementation that used normalization). The correspondances that fall below the threshold are denoted as 'inliers'.
5. Count the number of inliers that exist. If the number of inliers on this iteration is greater than on any previous iteration, save the inlier correspondances and the Fundamental Matrix that produced the outcome.
6. Repeat the above 5 steps N number of times (RANSAC is a hypothesize and test algorithm, i.e. you hypothesize and test N times and choose the best outcome from those N tries). Note that the best iteration for RANSAC would be one in which an iteration's random sample of 8 correspondances is free of any outliers. Theory suggests that the number of samples, N, needed to guarantee that this happens at least once can be described by the following equation:
'p' in the above equation is the probability that you sample at least one sample that is free of outliers, 's' is the number of points you sample each time, and 'e' is the proportion of outliers in your initial correspondance set. So given a 'p' of 99% (I'd like a 99% chance that I find at least one outlier-free sample), an 's' of 8 (this is just how we implement Fundamental Matrix Estimation), and an e of 50% (just a conservative estimate), theory suggests that I need to run the above loop 1177 times. In my actual code, I run the RANSAC hypothesize and test loop around this number of times (I run it 1000 times - I'm a little more optimistic than believing 50% of SIFT outputs are outliers).
Results
Index | Label | Image1 | Image2 | Comments |
---|---|---|---|---|
1A | Mount Rushmore Without Normalization - 225 Inliers |
First thing to note here is that I needed to ease my threshold for this image pair when I did not use normalization. My final implementation that includes normalization uses a thershold of 0.001 across all image sets, but to get a similar number of 'inliers' for this image set without normalization, I changed the threshold to 0.012. Having said that, the result looks pretty good - My fundamental matrix estimation results in epipolar lines that line up fairly closely to the corresponding points. Additionally, it looks like the match quality is very high (just trying to compare groups of points between the two views). It looks like for this image set, not normalizing did not have a meaningful negative impact on the result. |
||
1B | Mount Rushmore With Normalization - 240 Inliers |
Using normalization on this test set (now with threshold of 0.001) produces a good result that's similar in quality to the non-normalization case above. It looks like the epipolar lines go through the corresponding points pretty accurately. Additionally, the matches across both images look almost all correct. |
||
2A | Notre Dame Without Normalization - 163 Inliers |
Like image 1A, again I needed to change the threshold for the non-normalized case of the Notre Dame test set in order to get a similar number of inliers as the normalized case. More specifically I changed the threshold to 0.008 to get the result shown to the left. Note here that not normalizing seems to matter - I get a worse Fundamental Matrix estimation that results in 'mis-aligned' epipolar lines. This is shown in both images by the black circles that denote an epipolar line totally missing the corresponding point. As a result of this poor fundamental matrix estimation, you now start to see wrongful matches - note the mismatched points at the top of either tower, and some of the mismatched points on the street/behind the tree. Having said that, many of the points are still correct. |
||
2B | Notre Dame With Normalization - 145 Inliers | Including normalization seems to really fix many of the problems observed in 2A. In particular, it definitely results in better fundamental matrix estimation. This can be observed just by noticing that the epipolar lines in both views seem to really intersect the center of each point. This better estimation translates to better matches; in the two views to the left you see almost no mismatches. The only one I could find is circled in black in the left image. |
||
3A | Episcopal Gaudi Without Normalization - 120 Inliers | Like the other two non-normalization results, I had to switch the threshold here as well to 0.003 to get a similar number of inliers to the normalized case. The negative effects of not normalizing are most apparent in this image test set. I've circled instances where the epipolar lines are just nowhere close to the corresponding points. This implies a poor fundamental matrix estimation and translates to many mismatches. Note that many points on the roof of the building and in the sky are matched incorrectly. In fact, many points on the left side of the building are also incorrectly matched. |
||
3B | Episcopal Gaudi With Normalization - 81 Inliers | The advantage of including normalization is most apparent here. Unlike 3A, in this test the epipolar lines go perfectly through every point in both views. This implies a good fundamental matrix estimation and results in much fewer incorrect matches. In fact, I was only able to find two incorrect matches which have been circled in black in both views. Looking at these two incorrect matches, it's clear why they occured; the found match was on the same epipolar line and had a similar keypoint description. |