CS7322: Computer Vision II: Final Project Report



 

Video Mosaicing Using Manifold Projection

Huong Quynh Dinh



 

Overview of Project

The goal of video mosaicing is to produce a panoramic view from a sequence of images. The motivation behind this is that individual pictures or frames in a video sequence have a very limited field of view. We are able to see a much larger field of view. Hence, it seems natural to paste together a series of limited-view images to create one image with a field of view more similar our own.

The system that has been implemented consists of two parts. The first part of the system is general video mosaicing which allows a user to input a series of image sequences to produce one panoramic image. Parameters which a user may tweak include the size of the template for correlation, the type of matching that is performed, and the type of compositing that is performed. The second part of the system attempts to remove dynamic objects from the image sequence in order to recover the static background. The input parameter is a difference tolerance which specifies how different a pixel must be from one image to the next in order to be considered dynamic. Details about correlation, compositing, and dynamic element removal are described in the corresponding sections of this report.

The system performs fairly quickly, generating the panoramic image in realtime as the video file is read.
 

Correlation:

The first step in creating a video mosaic is to correlate pairs of consecutive images. As described in the proposal and progress report, translation-only correlation is first performed by using a portion of the second image in a pair as a template. The template is then slid over the first image. Correlation is done by accumulating the square of the difference between the image and the template and averaging over the size of the template. It was found that accumulating the multiplication of the image and template values did not give better correlation. The size of the template and correlation using luminance only or red, green, blue pixel intensities were left as parameters to be chosen by the user. This allows a user to test out different template sizes since a larger template may result in a better mosaic for some sequences while in others, a smaller template is sufficient.

Correlation is separated into two tasks - horizontal correlation using the template and vertical correlation using a single column. A heuristic on the horizontal shift from one image to the next was included in correlation. This was done to improve the performance of the system. This heuristic did not result in errors because the input is always a video stream of images. Such a video stream has a great deal of coherence from one image to the next. The heuristic assumes that each image is unlikely to be shifted by more than a quarter the width from the previous image.

The mosaicing algorithm is based on the paper, Panoramic Mosaics by Manifold Projection by Peleg et al. Peleg had performed vertical correlation on a column-by-column basis. However, it was not clear in the paper whether correlation was performed for each of the columns in the image. Separating an image into columns is a more intensive approach, and it seemed unnecessary because we already have vertical alignment within an image. Hence, vertical column correlation is performed only at the seam between consecutive images. Experimentation revealed that vertical correlation was not necessary for the most part. There is little vertical deviation in sweeping the camera in a circular motion. The additional vertical correlation results in more computation, and so was left as an option which the user may or may not use. An example of vertical alignment is shown below.



 

Compositing:

Correlation is used to determine how much the second image has shifted relative to the first in a pair of consecutive images. The pair of images can then be composited according to the shift. Three types of compositing has been implemented in the system:

Simple overlay pastes the second image on top of the first image, but shifted by the calculated value. Seam calculation takes into consideration the fact that alignment is usually better at the center than at the edges of images and that distortion is minimal at the center. The seam is calculated as the column that is equidistant to the centers of the pair of images being composited. Averaging can be done on the region around the seam. The number of columns to average around the seam is left as a parameter which can be specified by the user. This parameter is basically the size of the averaging kernel. The default is no averaging. A comparison of the three types of compositing is shown below in the following order: Simple Overlay, Seam Calculation, Seam Calculation with Averaging. Notice how the seams between consecutive images disappear as more sophisticated compositing schemes are used.





 

Dynamic Element Removal

All the above examples show a static scene. What happens when elements in the video are moving, such as people and cars? This question was the basis of the second part of the system. Simply mosaicing a video that includes moving elements would result in an image that has copies of the dynamic element. An example of such an image is shown:


In order to create a correct panorama of the static background, the dynamic objects in the video must be removed. This is accomplished by taking the difference between pairs of consecutive images. This difference would be positive in all pixel positions where there was movement. A tolerance level on the pixel difference can be specified by the user. This tolerance level is the difference threshold above which pixels are considered to in motion. Once the pixel positions of the moving elements are known, these pixels can be removed entirely from the image to be composited. The parts of the background that were occluded by the moving object in one image may be revealed in another image in the video sequence. Hence, by having a sequence of images, the static background can be mostly recovered. Note that the compositing step is done only after the dynamic elements have been removed. The following images are examples of recovered background. The specles of black or colored pixels which resemble a motion blur belong to the moving object. These are the parts of the static background which could not be recovered.





 

More Results





 

References:

Burt, P. and E. Adelson, A Multiresolution Spline with Application to Image Mosaics,
ACM Trans. on Graphics, 2(4):217-236, October 1983.


Peleg, Shmuel and Joshua Herman, Panoramic Mosaics by Manifold Projection, 1997


Szeliski, Richard, Video Mosaics for Virtual Environments,
IEEE Computer Graphics and Applications, March 1996