CS7322: Computer Vision II: Final Project Progress Report



 

Video Mosaicing Using Manifold Projection

Huong Quynh Dinh



 

Time-line to Current:

Week 1: 4/27-5/2
Project Proposal.
Capture test data.

Week 2: 5/3-5/9
Implement simple translation-only alignment.
Implement pasting of image columns.

Week 3: 5/10-5/16
Debug.
Test for real-time use.

Week 4: 5/17-5/23
Progress Report.

 

Test Set:

Three different test sets were captured. These include panoramic sequences that were captured with and without the use of a tripod. Two outdoor scenes and one indoor scene were captured so that the dataset would include different lighting conditions and different structural elements. For example, the indoor scene included bookshelves, books, papers, desks, etc. All these structures have straight edges which were mostly horizontally or verticlely biased. In contrast, the outdoor scene included trees, grass, leaves whose contours are not straight. Of the two outdoor scenes, one had only static elements, while the second included a moving car. This second outdoor test set will be used later this week to test removal and mosaicing of dynamic elements in a video sequence.
 

Correlation:

The first step in creating a video mosaic is to correlate consecutive pairs of images. As described in the proposal, a simple translation-only correlation is first conducted. This is done by simly using a portion of the second image in the pair as a template. The template is then slid over the first image. Two different correlation techniques were attempted. These include 1) accumulating the square of the difference between the image and the template and averaging over the size of the template, and 2) accumulating the multiplication of the image and template values. Using the first method, the best match occurs where the difference is minimal. Using the second method, the best match occurs where the accumulated value is largest. Neither method seemed to be markedly better than the other. Two different differencing methods were used - differencing on the overall luminance value and differencing on red, green, blue separately. Several parameters can be tweaked. The final implementation will most likely allow the user to select these parameters. The parameters are as follows:

The template size may be small or may be the same size as the second image. Selection of template affects the results as well as the real-time capability of the system. More about the real-time possibilities of this program will be discussed in the section on Results So Far. The different correlation techniques were described above. The shift heuristics allow the user to specify that the video sequence was a slow or fast pan by entering the most likely range of pixel locations by which the second image is shifted relative to the first image in a pair of consecutive images. So, if the sequence was a fast pan, the shift is most likely great, while if the pan was slow, the shift was most likely small.

It is important to note that correlation was performed at one-half the original height of the image. This is necessary because of interlacing. It was found that correlation on the full image size generated very poor results because the interlaced rows gave erroneous errors. It is imperative that these interlaced rows be removed prior to correlation.

The next step in correlation is to perform column-by-column vertical correlation. The correlation that has been implemented thus far was on multiple columns of the image and included only horizontal shifting. This initial correlation gives a rough estimate of where the images lie, relative to each other. Next, each image will be divided into its separate columns and vertical correlation between columns will be performed.
 

Compositing:

Correlation is used to determine how much the second image has shifted relative to the first in a pair of consecutive images. The pair of images can then be composited according to the shift. Currently a simple composite is performed that lays the second image on top of the first image, but shifted by the calculated value. Next, a more sophisticated technique will be implemented in which the columns closest to the center of an image will be displayed.
 

Results So Far:

The following are panoramic composites of the indoor and outdoor scenes. These scenes include only static elements.



The images show fairly distinct seams between composited images. Vertical column correlation as well as a better compositing technique should reduce the seams. The video sequence used for the top image was captured using a tripod, while the video sequence of the bottom image was captured free-hand. As a result the top image has much better vertical alignment than the bottom.

The above images were mosaiced using a template that was 50 pixels in width and 122 pixels in height. The images are originally 360x244. The program was highly interactive with a 50x122 size template. The video mosaic was produced within seconds. Even a template of 180x122 was fairly interactive.
 

References:

Burt, P. and E. Adelson, A Multiresolution Spline with Application to Image Mosaics,
ACM Trans. on Graphics, 2(4):217-236, October 1983.


Peleg, Shmuel and Joshua Herman, Panoramic Mosaics by Manifold Projection, 1997


Szeliski, Richard, Video Mosaics for Virtual Environments,
IEEE Computer Graphics and Applications, March 1996