Ugh. That is what I will say. I solved this problem pretty much manually.
First, I found an acceptable value to threshold the image to a binary format.
This meant A LOT of trial and error because if the letters were too filled-in
then it meant that the vowels probably would fit every character (like
the 'o') and visa versa if the threshold value was too low. Next,
I went pixel by pixel in Paint (Win95) and found the pixel locations of
almost all the vowels. For some reason I chose a 10 x 9 matrix to
represents the vowels. I made templates of all the vowels and put them
in a file. I then AND'ed all the samples of the letters together
to get an acceptable pattern to match all the vowels. This would seem simple
enough, but it was time consuming.
So, once I got a basic pattern for all the vowels, I found the region of
the image where there was 'something' and basically started through that
region pixel by pixel testing to see if it matched. I defined areas
to search for by going line by line (top to bottom) and checking if it
was an empty line. I made sure that there was at least 'half the size of
a vowel in pixels' length before I marked it as an area I could ignore.
Next, and here is the kicker - I could not only just use the foreground
pattern for the vowels, I also needed to match the background pattern as
well (by inverting the image). Once again, If I did not do this then
I would have vowels all over the place. So, I basically had two representations
for each vowel: the foreground and background pattern. Matching the vowel
meant ANDing each 10x9 sample of the image to both patterns for each vowel
and comparing foreground and background sums to the patterns. If I found
a match, then I wrote the topmost x and rightmost y location to a table
along with a number from 1 to 5 (representing which vowel had been found).
Then I moved over 9 pixels in the image and started searching again.
Once the entire 'searchable" area had been exhausted, I called a paint
program which took the table listing the x, y, and vowel type and basically
painted the vowels on top of the image. I got all the vowels to match 'perfectly'
on the first and second image.. The third image was well...... a nightmare.
As I had mentioned, I made my matrix 10x9 (a mistake). I did not realize
how important the matrix size was until I had gotten all the characters
recognized in the first two images and decided to try the rotate thingy.
The problem was that rotating the matrix (by 6 degrees) created matrix
dimension problems. It would have been easier if I had chosen 10x10. However,
since I chose this 'manual method', it was much too late to go back and
change it all. So, in the end, I wrote a routine that basically looks like
the main routine, but takes into account the dimension size and rotates
each vowel by 6 degrees (that is 60 times!) if a vowel had not been found
previously, of course. I tried this ONCE. Why? Well, it took over 40 minutes
to run with the rotate stuff in there. I would rather lose points on not
getting that to work than to muddle through it again (especially
since it is 4 am and I have a midterm tomorrow - er, today).
So, why did I not try some other method? Well, I did. I pretty much implemented
most of the algorithms in Chapter 1 and 2 of the book whenever I had a
'brain-storm' on how to solve the problem. I would get into it and then
say: "now what?". I tried the "Connected Components" algorithm and got
it to work, but this did not help me any as I still had to compare pixels.
I then tried the erode and dilate. I guess I am a perfectionist, because
I could not get the letters eroded and dilated enough so that each letter
was separate from the other characters and still recognizable. In either
case, I still had to go pixel by pixel again. Then I thought about
the moments stuff. Ok, so I found the area and the center of mass, I still
needed to calculate the same moments for each sample image matrix. All
the different attempts would still have me taking sample matrices of the
image and processing them to find whether they matched or not. I decided
that it would be "quicker' to only pattern match.
Okay, so I did not completely think things through. I should have stayed
with the moments notion so that I could at least identify the rotated characters
by the second moment equation.
So, there. That is how I solved it. The brute force method which took me
a week to implement (staying up till mid-morning, believe it or not).
I just wish that I hadn't lost my time trying so many different ways of
solving this just to make things "more efficient". <sigh>
I also must mention the colormap. What I did was basically take the original
gray value image and multiplied it by 0.88 so that each pixel value would
be remapped and I would have the upper 230-255 region open to make my own
colors. It turned out quite well.
One last note: I am running this at home using Matlab 5.1 on the win95
<eyuck> platform and so I would expect this to run the same on the sgi's
and suns, but who knows?
Assumptions
and Weaknesses
I made the following assumptions
I decided that I would have control of the threshold value (which
I chose between .73 and .8). If the threshold is changed, the algorithm
starts to lose some letters here and there. I also assumed that this program
was only to run on these three images. I doubt seriously that it would
run successfully on another image, even if the characters were around the
same size. I would expect it to find at least half of the characters, but
not all of them.
the major weakness of my solutions are:
Obviously, the major weakness in my program is the fact that it is not
general enough to apply to other images. The pattern matching is an average
of the first and second image. Next, it is time consuming because it is
going pixel by pixel through the image. Well, not the entire image, but
enough of it to make this program unbearable if it were an entire 8x11''
size image! However, the run-time is nothing compared to the time
it took to manually extract each letter from the images and average them
for a pattern.
I also had some problems with the colormap. Apparently, if the colormap
already existed, it would spit some error back about it being a script.
My workaround to this was to clear the colormap (my_map) and then run it
and change the colormap. If you run the paint_vowels function and get the
error, no biggie. Just run the paint_vowels again. It does not take more
then 5-10 seconds.
Oh yeah, another
weakness. I never had time to clean up my code and make it more modular.
I basically had a long file which did everything and so there were a lot
of cut and pasted sections that should have been made into functions. This
is not the way I usually program. I just did not have time to clean it
up. There are also problems with the patterns as I had to keep tweaking
them by adding 0 or 1, so the way I created each pattern has redundant
stuff like placing a 1 in a certain location then later filling it in with
a 0 and so on.
Improvements and
Possible Future Work
I think that this can be improved by doing
the following
Um, basically, I would redefine the pattern
to a more uniform matrix and clean them up. I then would use some of the
other notions like moments in conjunction with my patterns. I definitely
need to do something with the rotate stuff. I would not suggest running
it unless you have days to kill. =)
Results
I know we were to find all the e's then find
all the other vowels, but I decided to do it all at once in one program.
Therefore, I am only going to show each of the images after my program
ran through them.
Source
Code
data_vowels.m
: this is the data set for the vowel patterns (both foreground and
background)
find_vowels.m:
this is the main program which finds all the vowels and places the x,y,
and vowel type in a lookup table
paint_vowels.m:
this is the function which takes the lookup table and then paints the letters
on the image.
try_rotate.m:
this one is garbage. This is the 'add-on' to take care of the rotated letters.
It just takes TOO long.