CS 7321 Winter 1998
(Please debit my days lates account)
Character Recognition Problem
Index
How I solved it
-
First I was confused. With all the talk of eroding to
get the character, this didn't make sense to me.
Why?
Well what if there is noise and then the erode template doesn't
match ALL the points. Of course you could apply some type
of generating function to remove that noise. But I thought
that would remove the information that I needed. So.
First I examined the character and made a template of them
-
Which required making templates for all the characters: a, e, i, o, and u.
Which wasn't hard because the first image ps1-ex1.tiff had little
noise and i could figure out the original character pretty well.
-
I hand crafted the template very thin. I removed the thickness of the
template characters. Hoping that the character would match
in a several places. So if noise prevented a match in one
configuration, hopefully another closeby one would work also.
-
I added another feature to handle noise:
Its like eroding but
instead of matching every erodee with my template.
I made the match conditional on if a certain number
of matches were made, only then would "erosion" occur, that is
when a match would occure.
-
So I tried out my routines.
-
It got as many wrong answers as one would think.
M's think they are u's. g's think they are e's.
-
Indeed this type of pattern matching is tough. There
is no constaints on position of the characters, or that i assumed.
Well I wanted it to handle the general case of rotated characters.
So I couldn't use a projection initially to determine where a
line of characters begins and thus constrain my search to a couple of lines.
-
Also since I didn't have all the character in the font. I couldn't
make say a list of every character, and see which match is the best
of all of them. That is rank all potential conclusions that
a character might be. Say, a 'e' might be more likely than
a 'a' even though they both match; so I would say 'e' because it got
the higher matching score.
-
Otherwise I have to figure out by hand which
characters are most like to be confused with others.
And thus do the least likely confused first. Thus this
is one way to separate the likely to be confused from the
other characters.
-
But character matching for a _known_ font like this
should not involve such deliberation. What if the template
could be replaced with a percentage template telling us what
the percentage that a certain pixel should be on. A score
could be generated this way and then compared to others
to figure the best character. (Sorry i'm repeating myself)
This is one plus for not removing the gray from a scanned
font image. At the very least the binary image should
be used as a mask to get a gray scale representation of
the scanned characters.
-
Complexity
-
In addition, besides the conditional template I spoke
of previously. I also included flags in the templates
to indicate to the program specific unique features.
There is a flag used for indicating when certain pixels
_must_ be present. And another flag indication when certain
pixels should not be present.
-
Least I forget how I thresholded
-
To accomplish the transition from the gray scale image to
binary image, I applied some matlab algebra. And it seems to
work pretty well. The image appears well filtered in affect.
I apply a histogram on the data and then curve fit a third order
polynomial to it. It seems a good way to find the center of those
two peaks.
-
From there I threshold. Though there are parts where
the thresholding isn't that great. Specifically, where there
should be a gap of white space between characters, there isn't.
to TOP
Assumptions and Weaknesses
Hopefully I explained these above.
But specifically one weakness is the lack of constraints
on the data. I wish I could define where a line starts.
Also the lack of the full font character set is a
weakness. Also my program is slow as hell.
I should of used bwmorph but I need to learn more about
matlab so I can conquer it, not it me.
One major assumption that I made was that the background
was going to be white and the character black. Otherwise
my algebraic solution to the thresholding problem will be
wrong: since I alway pick the greater of the two roots of
the derivative of the histogram polynomial.
As for strength: there is the freedom implicit in
scanning everything. Though a lot of times its wrong.
Especially when the characters to be match might be
turned upside and every other way.
Improvements and Possible Future Work
I think that this can be improved by doing the following
Since the character font used is constant, more analysis
should be done of all the character in the fonts to find
relations between them. Because what I'm doing at present
seems more relavent to the general OCR problem not this
specific font one.
Results
Figure 1: the original
Figure 2: looking for: a
Figure 3: looking for: e
Figure 4: looking for: i
Figure 5: looking for: o
Figure 6: looking for: u (Hey, nothing there!)
Now for the second lot:
Results
Figure 7: the original
Figure 8: looking for: a
Figure 9: looking for: e
Figure 10: looking for: i
Figure 11: looking for: o
Figure 12: looking for: u (Note the error in mountain)
Now for the last image. Note I'm only putting up the
one for looking for e's since my program takes so (so) long.
For instance I left at 8pm and the program was only 1/4th done
at 11pm.
Figure 13: The original
Figure 14: looking for: e
Source Code
-
I just typed vowel.m at the prompt and then typed
image(k) after it was done. Note, I have to modify the
program to access a different source image. Though the program
does prompt you for the appropriate letter to recognize.
-
vowel.m: the function vowel.m used to extract
vowels
-
vowelr.m: modified vowel.m to do rotations.
I had to do it seperately because of the way imrotate works and
because of the flag values that I picked for the templates.