CS7322: Computer Vision II: Final Project Proposal



 

Video Mosaicing Using Manifold Projection

Huong Quynh Dinh



 

The Idea:

The technique of mosaicing involves piecing together smaller components in order to generate one large, seamless unit. In the case of image and video mosaicing, the image frames are the small components, and the result of piecing them together is a panoramic view. The motivation behind this is that individual pictures or frames in a video sequence have a very limited field of view. We are able to see a much larger field of view. Hence, it seems natural to paste together a series of limited-view images to create one image with a field of view more similar our own. In practice, however, this is not easily done because each image undergoes a perspective transform due to the rendering or image capturing process.

Once mosaicing has been achieved, it is possible to experiment with dynamic objects in the input image frames. In this project, we will attempt to remove dynamic objects in a video sequence while still maintaining a visually correct mosaic.
 

The Domain and the Scope:

Mosaicing can be performed on images or on a video sequence. This project will concentrate on video mosaicing. The primary difference between image and video mosaicing is that the frames in a video sequence retain a great deal of coherence while images may overlap only at the edges. Hence, a less rigorous and less computationally intensive approach may be used to align video frames. Such an approach would permit the possibility of real-time mosaicing of a video sequence.

As mentioned above, a second goal of this project is to remove dynamic objects from a video sequence as the mosaic of the video stream is generated. The test data we will use is a video sequence taken of the exterior of the College of Computing from one fixed view point while rotating the camera. Our initial video will not include any moving/dynamic objects in the scene. Our final test data will be the same scene with a single person or car moving in the scene.
 

Approach:

This project is based upon the techniques outlined in the paper, Panoramic Mosaics by Manifold Projection [Peleg, Herman 97]. The process involves:

Image alignment is accomplished in two passes. First, translation-only alignment is obtained using image correlation. Second, more accurate alignment involving both image translation and rotation is obtained by solving for motion parameters. Alignment can be improved using multi-resolution techniques in which coarse correlation is used to guide fine correlation of images. Alignment is also improved by processing images on a per-column basis. The columns of an image may be placed independently of each other, which allows finer accuracy in alignment. Using only the first pass may allow real-time mosaicing of video sequences.

In order to piece together consecutive images in a sequence, each image is cut into column strips. These strips may then be pasted independently of one another according to the alignment described above. The pasting of image sequences is accomplished by overlapping aligned images such that the middle of any image is given the greatest weigh or influence, while the edges contribute less. This is done because distortion is minimal at the center of images. There are two methods of pasting consecutive images. Either overlapping pixels can be averaged with more weight given to those pixels closest to the image center, or the pixel that is closest to its image's center is selected over all the overlapping pixels. In the latter case, blurring and deterioration of image quality is minimized, but the seams in the resulting image may be more apparent.

Seams are often the result of different color contrast between images. A multi-resolution spline technique can be used which composites images at each band-pass pyramid level.

As mentioned in the project domain, the final test data is a scene with one moving person or car. The dynamic object in each frame will be removed prior to alignment and pasting. Correlation between consecutive images should still be possible because all pixels associated with the dynamic object is replaced by a constant color, such as black or white. Identification of dynamic pixels to be removed is accomplished using flow field vector calculation (code supplied by CPL).
 

Proposed Time-line:

Week 1: 4/27-5/2
Project Proposal.
Capture test data.

Week 2: 5/3-5/9
Implement simple translation-only alignment.
Implement pasting of image columns.

Week 3: 5/10-5/16
Debug.
Test for real-time use.

Week 4: 5/17-5/23
Progress Report.
Add code to remove dynamic objects in scene.
Debug and test on final test data.

Week 5: 5/24-5/30
Add code to use multi-resolution pyramids in alignment.
Implement off-line, increased accuracy alignment with rotation and translation of input images.

Week 6: 5/31-6/6
Final Project Report.
Demo project.


 

References:

Burt, P. and E. Adelson, A Multiresolution Spline with Application to Image Mosaics,
ACM Trans. on Graphics, 2(4):217-236, October 1983.


Peleg, Shmuel and Joshua Herman, Panoramic Mosaics by Manifold Projection, 1997


Szeliski, Richard, Video Mosaics for Virtual Environments,
IEEE Computer Graphics and Applications, March 1996