CS 7321 Winter 1998.
Problem Set #1 Solution.
By Alan Daniels.


Index


The Problem

For this problem set, we were to write a MATLAB program that used binary image processing methods to find instances of lower-case vowels in three supplied images. I've done this for each of the five standard vowels ("a", "e", "i", "o", and "u"), for each of the three images, for a grand total of fifteen result images.In each of the result images, I've shown the found vowels I found by using the color red.


The Solution.

Getting Started...

The first thing I did was to create a bunch of sample images to use as "templates". I took sample image no. 2 (since it was the largest, and thus had the most letters to choose from), and turned it into a grayscale. After looking at its histogram, and trying various thresholds, I decided to make it into a binary picture by using a threshold of 0.8. With that binary image, I made five samples for each vowel, for a grand total of twenty five. I've included those samples at the bottom of the page.

As part of this, I wrote routines to:

The First Run...

Since I had five samples per vowel, the first run at the logic I did was this. For simplicity, I'm skipping all the image inversions from white-black to black-white, and vice versa, to get the MATLAB routines to work right:

  1. Load the image to be tested.
  2. Load the five sample vowels.
  3. Threshold the image into a binary image.
  4. Take each of the five samples and "thin" them.
  5. Erode each sample across the image.
  6. Tally the results of each erosion. If three or more of the samples matched, we have a hit.
  7. If needed, repeat this logic every 15 degrees, going from -90 to 90.
  8. Using a colormap, draw the matches in red.

The flaws in the approach showed up pretty quickly. By thinning the samples, I guaranteed a match inside of a genuine match, but it also matched even when the sample just happened to fit inside an arbitrary letter. For example, I got lots of matches of "i" inside of an "l", an "h", etc. All it had to do was fit, which didn't take the background into account at all.

The Second Attempt...

I knew that I had to take the background pixels into account. The next thing that I tried to do was write my own eroder, using this logic for the samples, which I were using as Structuring Elements ("SE"s):

  1. Take the SE, and copy it into two verions: One that has been "thinned", and one that hasn't.
  2. Take the thin SE, and see how many places its foreground pixels match to the source image.
  3. Take the fat SE, and see how many places its background pixels match to the source image.
  4. If we get 90% success in both cases, consider it a successful match, and mark it as such.

This gave better results, but it was SLOW, as in five minutes for each individual erosion, which obviously wasn't acceptable. Also, I noticed that the background logic still didn't work very well. If I was sloppy with how the sample was done (such as an "e" with lots of space around it), the number of background pixels was too high. This convinced me to write two more routines:

The Third Attempt

I had established two major things: One, I needed to match all of the foreground pixels using a normal erosion, otherwise the processing would take forever. Second, the background matching should only use the "halo" around a sample, to show what the natural border of the SE was. In other words, only the pixels immediately surrounding a sample, since every pixel further than that is just extraneous information. So, I made another attempt at the eroding logic:

  1. Take the SE and sqeeze out all the unneeded background,
  2. Add one pixel-width worth of border back onto the SE.
  3. Copy the SE into two verions: Thin and Fat.
  4. Take the thin SE, and use it for a direct erosion. This gives us all the possible success points.
  5. Take the fat SE, and get its "halo".
  6. At each point where we got an erosion match, check the halo:
    1. See how many "halo" pixels match background pixels on the original image.
    2. If this is below 50% of the number of halo pixels, we know we've gotten a false hit, so remove the point.

I got the halo of an SE by using the following logic: Wherever there is a black pixel on the SE that has one or two white neighbors, then this is should be considered a point on the halo. Although this approach isn't ideal, the practical results work pretty well. A look at the halo image gives you a good idea of the SE that it originally contained.

The halo approach worked much better than the first two runs did. For example, even if I rotated an "i" by 45 degrees, I didn't have to worry about the large square's worth of background that would result from the rotation of such a thin SE. Only the edges of the "i" would matter.


Assumptions and Weaknesses.

Assumptions...

Weaknesses...


Possible Improvements.

To improve on the design I have now, I would add "priority" levels to the pixels in the SE. Therefore, not only could you take every pixel on the SE and label it "black", "white" or "doesn't matter", but you could also give each pixel a floating point rating from zero to one. That way, you could define the letter I so that the gap between the body of the I and the dot absolutely had to be there. For example, one possible way to accomplish this would be to make a custom eroder with five pixel types:

Then, when the custom erosion is done, the "don't cares" are ignored, all of the "must" matches are required, and 70 percent or more of the "should" matches must occur. Only then is a match considered sucessful. This is the theoretical solution, but in practice, I wonder how feasible it would be to write a custom eroder in MATLAB that had any reasonable sort of processing time.


The Results.

The Samples That I Used For The Eroding.

The First Image.

Original:

With "A":

With "E":

With "I":

With "O":

With "U":

The Second Image.

Original:

With "A":

With "E":

With "I":

With "O":

With "U":

The Third Image.

Original:

With "A":

With "E":

With "I":

With "O":

With "U":


The MATLAB Code.

This is a list of links to the actual MATLAB source code that I used to get the final image files.

model.m

This is the main script, mostly just a driver for calling model_create.m.

model_create.m

This is the script used to generate each individual image.

model_eroder.m

This is the custom eroder function. Given a binary source image, and a structuring element, it returns a resulting binary image, using the logic described in Third Attempt.

model_halo.m

Given a small SE, this returns the "halo" of the SE in white. All the white pixels equal pixels just outside of the shape made by the original SE.

model_match.m

This is the main engine for the program. It does the tallying of the runs of the five samples, including for each rotation if needed.

model_readgray.m

A no brainer. This reads in a standard indexed TIFF file, turns it into a grayscaled image, and returns it.

model_squeeze.m

Given an ST, this trims away background pixels at the border until only the foreground image is left.

model_threshold.m

Another no brainer. Given a grayscale image and a threshold value, this turns it into a binary image (all pixels are ones or zeros).

model_unsqueeze.m

This takes an ST and adds back one pixel's worth of background border. Written mainly for model_halo, which needs that border to work correctly.