CS 7321 Winter 1998, Problem Set No. 3
By Alan Daniels

Index

The Problem

The problem that we were supposed to solve was to implement the algorithm from Multiresolution Spline With Application To Image Mosiacs (by Burt and Adelson), and use it to join pictures together. The intent was to use Gaussian and Laplacian pyramids, along with an image mask, to combine two images together without leaving any visible "seams" in the resulting image.

The Solution

The solution to this was actually fairly straightforward, since the algorithm to be used was explained pretty clearly as part of the paper. Just about all of the difficulty for this assignment was in the actual implementation of the algorithm. Most of the trouble that I ran into concerned getting the dimensions of the various levels of the Gaussian and Laplacian pyramids to match up, especially when doing a reduce immediately followed by an expand. As you know, MATLAB will only do matrix additions and subtractions when the sizes of the two matrices are identical.

Assumptions and Weaknesses

Assumptions

The biggest assumption that I made when writing the code was the requirement for full color. Originally, I tried mixing different image types, such as full color for the images, and a bitmap for the mask, but this became overly complicated after a while, and so I decided to use full color for all of the Gaussian and Laplacian math. Also, due to the use of full color, my MATLAB code is not as clean as it could be. There are many cases where I'm converted between MATLAB cells and standard full color images, and the code could have been simpler if I had found a way to make the code take a more consistent approach.

Weaknesses

The biggest weakness that I had in implementing this problem set was not with the actual algorithm or coding, but in finding decent sample images, and the ability to create masks on the SGI machines. Ideally, I would have liked to have found more cases where I could create custom masks in order to get more interesting image blending effects, but due to the lack of image processing software on the SGIs, I had to make a few prefab masks on another machine, and copy them over for use by MATLAB.

Possible Improvements

The best immediate improvement for the program would be for it to automatically adjust the contrast and color-tone of each image so that the final merged picture has a smoother overall appearance. For example, if two lanscapes or two faces are being merged, minor color differences between grassy areas or skin tones are adjusted for, so that the final image has less differences between sides.

Another great improvement for the program would be the ability for it to do cropping and alignment automatically, so that the final mosaic image had better coherence after it was done. For example, if two faces were being merged, it would automatically scale one image or the other so that the top and bottom of each face were vertically aligned before the merge process was attempted. Of course, implementing an improvement like this would be complicated, and probably only necessary for a commercial level product.

What I Learned

This problem set was interesting, and was especially helpful in teaching me how the Gaussian pyramids worked. Not only for creating the image mosaics, but also just in learning how to create a half-sized version of an image while still retaining most of the image. As a side note, it was also educationl in teaching how to make the MATLAB language do what it needs to.

The Results

The City Meets The Country

The two images here are both landscapes of sorts. The first is of Tucson, Arizona in 1970, and the second is of Jackson Hole, Wyoming in 1948. For the mask, I used a simple black and white split down the middle, but I thought the images lined up well, and I thought the results are interesting.

Moutains and Cathedrals

The first image is of St. Peter's Cathedral in Rome, and the second is of a mountain range visible from Milford Sound, in New Zealand. I thought it would be kind of cool to see mountains growing out of a piazza.

Mulder and Scully

I figured that I'd try out an image mosaic to see what David Duchovny and Gillian Anderson, split down the middle, would look like. Unfortunately, I could find two portraits that had identical alignments and such, so I had to crop the images by hand. As a result, the final image is less than idea, but still looks neat anyway.

The Code

ps3.m This is the "main" for the program. It sets up the names of the images to be processed, calls the engine that does all the pyramid math, and shows the results.
engine.m This handles the opening of the image files, creating the related pyramids, and doing the matrix math on the individual levels, in order to create the final pyramid that gets reconstructed.
read_color_image.m This is for taking a filename and reading as a color image with the three separate color planes: red, green and blue. If a picture is originally a grayscale image, then it is copied over to each of the color planes. This is done because most of the code assumes, for simplicity, that it has full color images to work with.
make_gaussian.m Given a single grayscale image, this creates the corresponding multilevel Gaussian pyramid. Note that this is called three times for each image, once each for the red, green and blue color planes.
make_laplacian.m Given a single grayscale image, this creates the Laplacian pyramid for it. This is similar to the Gaussian pyramid logic, and like the Gaussian pyramid code, its called separately for the red, green and blue color planes.
reduce.m This is an indivdual piece in the creation of the Gaussian pyramid. The code is very simple. Given a single grayscale image, it adds the two pixels worth of border to each edge, and calculates the border values. Once this is completed, it performs low-pass filtering using a 1 by 5 Gaussian kernel, and then copies the image into a half-sized version of itself, using the "every other pixel" logic as described in the paper.
expand.m This is used as part of the creation of the Laplacian pyramid, and as part of the reconstruction process (where the final pyramid is rebuilt into the final image). Like the reduce function, it takes one grayscale image, and returns a grayscale image as a result. It expands the image, using the "every other pixel method", and then rebuilds the values by using a low-pass filter, once again a 1 by 5 Gaussian kernel, but multiplied by four to keep the scaling correct.
show_image.m Many times during the debuggin process, I needed to be able to see how an indvidual plane of a Guassian or Laplacian pyramid looked like. I wrote this custom function to do this, since the pyramid data is not in a format that is friendly to MATLAB's standard imshow function. Not that this function is also used to show the final image, before that image is written out to a file.
reconstruct.m After the engine function has woven together the three pyramids (two Gaussian, one Laplacian), this function takes the resulting pyramid and uses it to reconstruct the original image.

This document was written by Alan Daniels, on Feb 22nd, 1998.
If you have any questions or comments, please