Camera projection matrix mps 2D world coordinates to image coordinates. To estimate the projection matrix, corresponding 3D and 2D points in one image need to be provided. From the homogeneous transformation equation,
A\b
operation, we solve the system of linear equations in the least square sense.
Reformulate the projection matrix into 3x4 matrix form. The result from the provided 3D and 2D points is,
Points_a
and Points_b
, we can compose the matrix A
as
LinEquMatrix
by the following.
LinEquMatrix = zeros(numPoints,9);
LinEquMatrix(:,1) = Points_a(:,1).*Points_b(:,1);
LinEquMatrix(:,2) = Points_a(:,2).*Points_b(:,1);
LinEquMatrix(:,3) = Points_b(:,1);
LinEquMatrix(:,4) = Points_a(:,1).*Points_b(:,2);
LinEquMatrix(:,5) = Points_a(:,2).*Points_b(:,2);
LinEquMatrix(:,6) = Points_b(:,2);
LinEquMatrix(:,7:8) = Points_a;
LinEquMatrix(:,9) = ones(numPoints,1);
Decompose the matrix A by SVD decomposition. The fundamtrix matrix F_matrix
is the last column vector of the unitary matrix.
As the fundamental matrix is of rank 2, we need to set the smallest singular value in its SVD decomposition to 0.
[U,S,V]= svd(LinEquMatrix);
Fvector = V(:,end);
F_matrix = reshape(Fvector,[3,3])';
[U,S,V] = svd(F_matrix);
S(3,3) = 0;
F_matrix = U*S*V';
Based on the estimated fundamental matrix, we can draw epipolar lines on the each image. The epipolar lines are going to cross
the correspoinding points.
Fundamental matrix can also be used for matching features from multiple scenes. Often times perfect point corresponence from differnt scenes does not exist. To leveage the unreliable point correspondences, RANSAC can be applied to estimate the fundamental matrix from noisy point pairs. SIFT feature detector is used to find interest point pairs. VLFeat toolbox is used to do SIFT matching. However VLFeat 0.9.20 binary package has compactiablity issue in Win 10 environment. VLFeat 0.9.19 is finally installed at last.
8 or more point pairs (parameter from numChosen
) is iteratively randomly chosen as the input for calculating the fundamental matrix. Then inliers are found for this
fundamental matrix based on the distance measurement. If a point pair satisfies XFX'< Threshold
,
then it is count as an inlier. Choose the fundamental matrix that provides the most inliers from a certain number of iterations.
Then with experimented values of the thresold and maximum number of iterations, we can find good fundamental matrix.
There are mainly 3 parameters, namely numChosen
, Threshold
and numItr
. Different parameters
have different effects on the results. As the matched features from VLFeat are not all right, randomly chosen numChosen
pairs
from them can easily give us bad pairs for calculating the fundamental matrix. Thus we would choose less numChosen
pairs to calculate
the fundamental matrix and run large number of iterations to find the best fundamental matrix calculation with maximum inliers. Thus for the following
image pairs, numChosen
is generally 8 and numItr
is over 2000. The thresold value is chosen so that the maximum number of
inliers are generally 1/4 of all the input matched features. It should not be too small or too large. However, with carefully chosen
parameters, the fundamental matrix found from RANSAC does not necessarily be perfect because the points are coplanar and the
cameras are not actually pinhole cameras and have lens distortions. But after testing several image pairs, it is safe to say that the
fundamental matrix from RANSAC could help remove incorrect point matches.
numPoints = size(matches_a,1);
mean_a = sum(matches_a,1)/numPoints;
matches_a = matches_a - repmat(mean_a,[numPoints,1]);
scale_a = sqrt(sum(sum(matches_a.^2))/2/numPoints);
matches_a = matches_a/scale_a;
TranslationA = [1,0;0,1;0,0]';
TranslationA = [TranslationA;[-mean_a,1]]';
TransformA = diag([1/scale_a,1/scale_a,1])*TranslationA;
mean_b = sum(matches_b,1)/numPoints;
matches_b = matches_b - repmat(mean_b,[numPoints,1]);
scale_b = sqrt(sum(sum(matches_b.^2))/2/numPoints);
matches_b = matches_b/scale_b;
TranslationB = [1,0;0,1;0,0]';
TranslationB = [TranslationB;[-mean_b,1]]';
TransformB = diag([1/scale_b,1/scale_b,1])*TranslationB;
The scaling after translation defines a transformation matrix for each image, which also needs to be calculated to adjust the
fundamental matrix calculated from the normalized coordinates. With this adjustment, the fundamental matrix can operate on the
original coordinates.
F_matrix = transpose(TransformB)*F_matrix*TransformA;
The normalization part is implemented in normalization.m
, taking original point coordinates and
returning the normalized coordinates with the transformations.