Computer Vision Project

In this project, we look at the problem of camera calibration through the following three steps:

Projection Matrix & Camera Center

In this part of the project, I calculate the projection matrix by building the following homogeneous linear system:

In this equation, X,Y,Z are the 3D point coordinates and u,v are the 2D point coordinates. M represents the projection matrix. Using SVD, I can solve for M. After solving for M, I get the following projection matrix:


0.4583   -0.2947   -0.0140    0.0040
-0.0509   -0.0546   -0.5411   -0.0524
0.1090    0.1783   -0.0443    0.5968

for the normalized points, which is slightly different from the answer given. I'm not quite sure why this is the case--perhaps differences in handling the floating point values or something along those lines, though I wasn't able to pinpoint where exactly that occurred. My total residual is 0.0445, which is fairly low. I solve for the camera center by following the equation from the project page, where Camera Center = -Q^-1 * m4, where Q are the first three columns of M and m4 is the last column of M. After solving for the camera center, I get:


<-1.5127, -2.3517, 0.2826>

for the normalized points, which is slightly different from the answer given. The minute errors are probably carrying over from the slightly different values in the calculation for the projection matrix.

These results can be seen in the following images, where only about 5-6 projected points aren't squarely on the actual points.

Fundamental Matrix

Then, I calculate the fundamental matrix with the following equation:

where x', y' is one set of points and x, y are another--these two sets of points correspond to one another. Again, I set up the system of equations accordingly to solve for A using SVD. Then using the output of SVD, I am able to calculate the fundamental matrix:


-6.60698417012863e-07	8.82396296136248e-06	-0.000907382302152503
7.91031620841439e-06	1.21382933020618e-06	-0.0264234649901806
-0.00188600197690852	0.0172332901072652	0.999500091906723

However, this least squares estimation is full rank and the fundamental matrix is of rank 2. So, I use SVD again to decompose it into U, S, V and set the smallest value in S to 0. I recalcualate the fundamental matrix by multiplying U, the new S, and V to calculate the following final matrix:


-5.36264198382353e-07	8.83539184115726e-06	-0.000907382264407744
7.90364770858056e-06	1.21321685010730e-06	-0.0264234649922034
-0.00188600204023565	0.0172332901014488	0.999500091906703

This matrix is visually verified as being correct by plotting the epipolar lines, which you can see below:

In the pictures, you can see that these epipolar lines make sense. However, the majority of them don't go through a point on the image, which doesn't make sense since all that's needed for this part of the project is setting up the matrix for A and then the 6 lines of code from the class lectures.

Fundamental Matrix & RANSAC

First, the VL Feat implementation of SIFT is used to produce descriptors and possible matches.

Image	SIFT Descriptors in Image A	SIFT Descriptors in Image B	Possible Feature Matches	Inlier Feature Matches
Rushmore	5581	5246	825	87
Notre Dame	3396	3025	851	96
Gaudi	12683	8766	1062	124

Then, I generate 8 random values, which represent the correspondence indices of the matching points--these 2 sets of 8 points are what I use to build my fundamental matrix.

Results

I'm still not sure what's going on with my epipolar lines--unless my understanding is flawed, each feature point in the photo should have a line go through it and that's not what's happening in the above photos. The discrepancy between the epipole in Gaudi is also huge, though the matches in Gaudi look acceptable for the most part. As you can see, there's only one major outlier in the Rushmore pair, 4 major outliers in the Notre Dame pair, and around a dozen in the Gaudi (major outlier meaning I could spot them easily within the first few minutes). I played around with the threshold for what's considered an inlier and these were the best results I could produce--anything too large and I'd get too many matches, anything too small and I'd get far too few. I could have improved on this by sorting them by the distance heuristic (x1 * F * x2') and only taking the top howevermany, but I figured that these results were good enough for the first run.

Graduate Portion

In this portion of the assignment, I improve the above results with normalization. As mentioned in the lecture, the numbers can easily blow up, with certain regions of the matrix holding values much larger than other sections. To correct for this, I scale these coordinates by computing the scale factor from the mean coordinates. I tried two methods of scaling. The first was to use the maximum value from the points to scale, which led to a difference between the original fundamental matrix and the normalize fundamental matrix, the difference of which you can see here at each index for the Rushmore image pair:


-0.0000    0.0000   -0.0000
 0.0000    0.0000   -0.0009
-0.0003   -0.0007    0.5667

Those are pretty minute differences for the most part, except for at 3,3. So, I tried using standard deviation to see if I could get the original and normalized fundamental matrix to be the same:


-0.0000    0.0000   -0.0081
-0.0000   -0.0000    0.0150
 0.0047   -0.0196    1.6878

That ended up doing even worse, so I stuck with the maximum value method. After applying this to the estimation of the fundamental matrix implemented in part 2, I get the following results:

Results with Normalization

Oddly enough, normalization has the opposite effect from what I thought it would--it should make sure we don't get only features clustered in one region (which we see in the Gaudi picture on the left and both Rushmore pictures) and it actually makes my results worse. Maybe other normalization schemes for calculating scale would do better, but even changing up my paramters for inliers and outliers did nothing.