CS 7321 Winter 1998
PS#1 Solutions by Bob Nayar
Character Recognition Problem
Index
How I solved(?) it
-
First I converted the greyscale image into a binary image and define a
colormap.
-
This required writing my own thresholding function, because the im2bw functin
stores white pixels as "1" and black pixels as "0". In order to do rotation,
thinning, thickenning and erosion using the standard MATLAB functions,
I had to make it a true binary image.
-
I had to define a colormap that had white, black and the vowel colors (in
that order). I also had to write a trivial function BINDIS, which incremented
the binary matrix by for experimental display purposes, since the colormap
is 1-based.
-
Then I sampled each letter and "prepared" it
-
This required "thinning" the vowel by using the 'skel' option of the BWMORPH
infinite times.
-
I then used the erode command on the binary image with the "thinned" vowel
-
Then I found I had to do this
-
In order to get more hits, I had to thicken the main binary image at least
once.
to TOP
Assumptions and Weaknesses
I made the following assumptions
-
The origin of the erosion element used by MATLAB's ERODE command is in
the center of the erosion element. Because of this, everytime, I found
a pixel in the eroded matrix, I scanned for the dimensions of the sampled
vowel around that point. I don't know if this was a correct assumption
or not.
-
By "thinning" the vowel, and thickening the image, I did not have to take
different samples of the vowels to get an optimal average skeleton. This
was a very stupid assumption
I think the major weakness of my solutions is:
-
By "thinning" the vowel and thickening, I introduced ambiguity up the wazoo.
For example, an "o" can easily fit into the circular part of a "d" and
an "i" can easily fit into an "l". This proved to be so disastrous, that
I believe my program identified thickened "a"s as "e"s, because the skeleton
fits into both.
-
Being a EE, my code optimization skills are pathetic. Exercise 3 took about
half an hour to run on the SUN.
Improvements and Possible Future Work
I think that this can be improved by doing the following
-
using other binary parameters such as Euler numbers and perimeter measurements
to widen my erosion criteria.
-
averaging at least 5 different samples of one vowel, before "skelitizing".
-
changing the colormap scheme before writing into an image file, so that
the backgrouns is white and the original letters are black.
-
optimizing my code so that I could have improved my run-time and increased
the time that I would have had to experiment.
Results
Figure 1: As seen in this picture and the other 2, I made the mistake
of not changing the colormap to switch black and white.
This picture somewhat proves my statement about letters such as "l" and
"d" being falsely recognized as "i"s and "o"s.
Figure 2: A few other observations can be made from this picture. For
instance, it seems that the "o" is the easiest shape to recognize. Unfortunately,
it is also obvious that by thickening the "e" and certain other letters,
they are falsely identified as "o"s. Another observation is that the "game
is won" by the letters that searched for last, which is why a lot of the
letters are pink; ie they have been falsely identified as "u"s.
Figure 3: To re-iterate the last point, this is defitnitely "won"
by the letter that goes last. Hence the skeleton of the "u", which was
searched for last, was able to fit into almost all the thickened letters
in this image, especially because all angles between 0 and 360 were tried
in 5 degree increments.
Source Code
-
vowel3.m: This is the main script and contains
the vowel extraction code as well. This is written specifically for exercise
3, and hence has an angle rotation loop.
-
bnthresh.m: Bob Nayar's thresholding function
which converts grayscale into binary and assigns 0 to white pixels and
1 to black.
-
bindis.m: This function was only used for
experimentational display. It adds 1 to each element in a binary image,
so that I could use my 1-based colormap.