Example of epipolar lines.
The goal of this project is to introduce you to camera and scene geometry. Specifically we will estimate the camera projection matrix, which maps 3D world coordinates to image coordinates, as well as the fundamental matrix, which relates points in one scene to epipolar lines in another. The camera projection matrix and the fundamental matrix can each be estimated using point correspondences. To estimate the projection matrix (camera calibration), the input is corresponding 3d and 2d points. To estimate the fundamental matrix the input is corresponding 2d points across two images. The author will describe the implementations and their results in the following order:
The goal of this part is to estimate the projection matrix that converges a world cordinate to a camera cordinate as follows:
The author uses a homogeneous linear system (Ax = 0) to obtain the projection matrix. The x vector in this part consists of the 12 entries of the projection matrix as follows:
The world coordinates (Points_3D) and the camera coordinates (Points_2D) are given in this problem; thus, the above A matrix can be solved in two steps. First, build the A matrix above using the given n pairs of the world and camera coordinates. Second, solve the homogeneous linear system using the singular value decomposition. The codes are shown below:
% solve Ax = 0
% build A, see 13.pdf page 36
n = size(Points_3D, 1);
if(n ~= size(Points_2D, 1))
disp('Sizes of Points 2D and 3D do not match');
end
A = zeros(2 * n, 12);
for row = 1:2 * n
world_pos = Points_3D(ceil(row / 2), :);
camera_pos = Points_2D(ceil(row / 2), :);
u = camera_pos(1, 1);
v = camera_pos(1, 2);
if(mod(row, 2)) % odd row
A(row, :) = [world_pos 1 0 0 0 0 -u * world_pos -u];
else % even row
A(row, :) = [0 0 0 0 world_pos 1 -v * world_pos -v];
end
end
% see 13.pdf page 36
[U, S, V] = svd(A);
M = V(:, end);
M = reshape(M, [], 3)';
The computed projection matrix with the above method was
Note that the projection matrix is ambiguous up to a constant scale; thus, the projection matrix can not uniquely derived. However, it does not affect the derivation of the camera center. In order to evaluate the accuracy of the projection matrix, the projection pre-multiplies itself with the given world coordinates and compute the expected camera coordinates (u' and v'). The performance criterion, residual, is computed as follows:
The total residual with the given coordinates was 0.0445, and we can tell that the estimated projection matrix is good enough. Once the projection matrix is computed, computing the camera location is straightforward. Look at the slide 13.pdf page 45. The camera center can be computed as shown in the code below:
% Computes a camera center from a projection matrix
function [Center] = compute_camera_center(M)
% see 13.pdf page 45
Q = M(1:3, 1:3);
m4 = M(:, 4);
Center = -inv(Q) * m4;
end
The computed camera center was <-1.5127, -2.3517, 0.2826>. Figures below summarizes the results of the part 1. The figure on the left hand side shows the projected 3D coordinates as well as the given 2D coordinates. Since all the projected points are located near the 2D points, we can tell the estimated projection matrix is computed correctly. The figure on the right hand side shows the relationship between the 2D and 3D coordinates and the camera location.
camera center | -1.5127 | -2.3517 | 0.2826 |
residual | 0.0445 |
TPart 2 works on the estimation of a fundamental matrich, which maps a point in one image to the line in the other image. The fundamental matrix is a 3 by 3 matrix; thus, we estimate 9 entries of the matrix. Since the fundamental matrix is not full ranked, we only need 8 pairs of the points. This means that we estimate the 9 parameters with 8 inputs and 1 constraint. The author uses a similar method to the one used in the Part 1 to estimate the entries in the target matrix. Two main differences to the Part 1 is
% normalization
use_normalize = 1;
if(use_normalize)
% naive implementation of the eight point algorithm has a poor numerical
% condition. Use the normalized eight point algorithm. See the 14.pdf page
% 40.
% Center the image data at the origin, and scale it so the mean squared
% distance between the origin and the data points is 2 pixels. -> which
% means the points are restricted to -sqrt(2) ~ +sqrt(2)
% The points are translated so that their centeroid is at the origin.
% Piazza, instructor's response
c = [mean(Points_a(:, 1)) mean(Points_a(:, 2))];
c_prime = [mean(Points_b(:, 1)), mean(Points_b(:, 2))];
Points_a_shift = [Points_a(:, 1) - c(1), Points_a(:, 2) - c(2)];
Points_b_shift = [Points_b(:, 1) - c_prime(1), Points_b(:, 2) - c_prime(2)];
m_shift = [1, 0, -c(1);
0, 1, -c(2);
0, 0, 1];
m_shift_prime = [1, 0, -c_prime(1);
0, 1, -c_prime(2);
0, 0, 1];
% compute mean squared distances
msq = mean(Points_a_shift(:, 1) .^ 2 + Points_a_shift(:, 2) .^ 2);
msq_prime = mean(Points_b_shift(:, 1) .^ 2 + Points_b_shift(:, 2) .^ 2);
% compute scale s,
ssq = 2/msq;
ssq_prime = 2/msq_prime;
s = sqrt(ssq);
s_prime = sqrt(ssq_prime);
m_scale = [s, 0, 0;
0, s, 0;
0, 0, 1];
m_scale_prime = [s_prime, 0, 0;
0, s_prime, 0;
0, 0, 1];
%[u', v', 1]^T = [s 0 0; 0 s 0; 0 0 1][1 0 -c_u; 0 1 -c_v; 0 0 1][u v 1]^T
for i = 1:n
tmp(i, :) = (m_scale * m_shift * [Points_a(i, :) 1]')';%'
tmp_prime(i, :) = (m_scale_prime * m_shift_prime * [Points_b(i, :) 1]')';%'
end
Points_a = tmp(:, 1:2);
Points_b = tmp_prime(:, 1:2);
end
You can enable/disable the normalization with the switch 'use_normalize'. These normalized points are used to build a A matrix, and the fundamental matrix is a solution of Ax = 0. The singularity constraint is achieved by letting the smallest eigenvalues to zero. Finally, the fundamental matrix is reconfigured to the original mean and scale by premultiplying and postmultiplying the shiting and scaling matrix. The following code does these operations:
% rename to the notations used in the slides.
u = Points_b(:, 1);
v = Points_b(:, 2);
u_prime = Points_a(:, 1);
v_prime = Points_a(:, 2);
% build an A matrix. See the slide 14.pdf page 33
A = [u .* u_prime, u .* v_prime, u, v .* u_prime, v .* v_prime, v,...
u_prime, v_prime, ones(n, 1)];
% solve Af = 0, See the slide 14.pdf page 34
[U, S, V] = svd(A);
f = V(:, end);
F = reshape(f, [3 3])';%'
% use the singularity constraint, see the slide 14.pdf page 36
[U, S, V] = svd(F);
S(3, 3) = 0;
F = U * S * V';
if(use_normalize)
F = (m_scale_prime * m_shift_prime)' * F * (m_scale * m_shift);
end
F_matrix = F;
The below tables and figures summarizes the results of the estimation with and without the nomalization. As we can see from the figures, the normalization improved the accuracy. (The epipolar lines got closer to the interest points with normalization.)
Fundamental matrix without normalization | |||
F= | -5.3626e-07 | 7.9036e-06 | -1.8860e-03 |
8.8354e-06 | 1.2132e-06 | 1.7233e-02 | |
-9.0738e-04 | -2.6423e-02 | 9.9950e-01 |
Fundamental matrix with normalization | |||
F= | -1.1725e-07 | 1.6082e-06 | -4.0198e-04 |
1.1121e-06 | -2.7344e-07 | 3.2332e-03 | |
-2.3640e-05 | -4.4440e-03 | 1.0346e-01 |
Epipolar lines without normalization | |
Epipolar lines with normalization | |
Part 3 derives the fundamental matrix in more realistic sense than Part 2. In Part 2, the corresponding pairs are given, and the fundamental matrix was derived based on these correspondence. However, the correspondence problem is one of the biggest challenges to find a fundamental matrix. Part 3 consists of solving the correspondence problem. Here, the author uses the vl_feat library (http://www.vlfeat.org/matlab/matlab.html) to extract interest points, compute SIFT features, and solve the correspondence problem. This is a fast library, but the accuracy of the matching is not very great. Therefore, the key part of Part 3 is to improve the matching quality using the RANSAC algorithm.
RANSAC (RANdom SAmple Consensus) consists of the three steps below:
For the derivation of the fundamental matrix, this can be more concretely explained as follows:
max_iteration = 30000;
s = 8; % min # needed for model estimation, 8 point estimation
samp = 8;
if(samp < s)
disp('SAMP TOO LOW');
end
delta = 0.03;
n = size(matches_a, 1);
if(size(matches_b, 1) ~= n)
disp('matches a and b need to have the same size');
end
max_score = -1;
for i = 1:max_iteration
% index = randi(n, s, 1); % randi (max_int, row, col)
index = randperm(n, samp)';%'
x_prime = matches_a(index, :);
x = matches_b(index, :);
F = estimate_fundamental_matrix(x_prime, x);
% converts to the homogeneous coordinate
x_n = [matches_a, ones(size(matches_a, 1), 1)];
x_prime_n = [matches_b, ones(size(matches_b, 1), 1)];
error_fundamental = zeros(size(x_n, 1), 1);
for j = 1:size(error_fundamental, 1)
% note the 14.pdf 29, and everywhere on the slide uses x^TFx_prime
% = 0. This is because their expression uses a column vector, but
% we store data as a row vector. The transpose locations,
% therefore, are different
% this should be zero if a perfect fundamental matrix xFx = 0
error_fundamental(j) = abs(x_n(j, :) * F * x_prime_n(j, :)'); %'
end
ok_index = [];
score = 0;
for j = 1:size(error_fundamental, 1)
if(error_fundamental(j, 1) < delta)
ok_index(end+1) = j;
score = score + 1;
end
end
fprintf('[@%d] mean_error = %f\n', i, mean(error_fundamental));
if(score > max_score)
max_score = score;
Best_Fmatrix = F;
inliers_a = matches_a(ok_index, :);
inliers_b = matches_b(ok_index, :);
end
end
fprintf('Best score %d out of %d\n', max_score, size(matches_a, 1));
The results of Part 3 are summarized in the tables and figures below:
Mount Rushmore | |||
F= | 1.7323e-08 | 2.0813e-06 | -3.4155e-03 |
-2.1899e-06 | -4.2035e-08 | 2.1237e-03 | |
3.4402e-03 | -1.9433e-03 | -5.4517e-02 | |
inliers | 560/825 | ||
threshold | 0.03 | ||
iterations | 10000 |