CS7321 Winter 1998
PS1
Character Recognition
Presented by Jim Rowan
The overall approach:
I approached each part of this project in two phases.
The first phase was the character recognition phase.
This phase produced a two dimensional matrix for each of the vowels
that contained the row and column coordinates of all vowels of that
type found. The second phase read these matrices and using the coordinates
in the matrix retrieved the 11 x 11 patch from the original thresholded image.
This patch was logical ANDed with another patch that had a rectangular
grouping of 1 bits roughly positioned where the vowel would be.
This resulted a group of 1 bits that were then changed to either 2, 3, 4,
5 or 6 depending on which of the vowels were being colored. By attaching
a color map to this file and writing it out as a jpeg a rough coloring of
the vowel in question was achieved in that some of the bits that were actually
part of the vowel in question were left as 1 bits and therefore displayed
as black.
The sample text:
The text sample was scanned at 72 dpi and at that resolution there was
very little data to work with. This lack of resolution makes the task of
distinguishing an "o" from a "d" from a "p" very difficult. Having said this
I must now admit that I own OCR software that runs just fine on a 10 MHz machine.
As a result of working on this project I have developed a very healthy respect for
that software. In fact I am amazed at how fast the purchased software runs and
how miserably slow (in comparison) my approach to the character recognition
problem runs on a 180MHz machine.
Thresholding:
To get a binary image the sample had to be thresholded and when an
image is thresholded there is a decision to make. Do you threshold at,
say, 50% and live with several "e"s that have gaps or do you threshold
at, say, 75% and live with "m" and "n that close at the bottom and look
like "o"? This is a particularly vexing problem since this first decision
affects all processing steps from the thresholding on. I made the decision
to threshold at 75% and deal with letters that close where there should be
a space in some other manner. It seemed intuitively better to deal with too
much data (open letters that closed) than too little data (closed letters that
opened).
With the text thresholded I now had a binary image to work with and
the question became was there any way to make certain characteristics
of the vowels stand out and therefore make them more easily recognized?
I worked with thinning, dilating and morphing but had little improvement
in one area that wasn"t accompanied by degradation in another area.
My approach was to use a model of the vowel:
I decided to work with the idea that I could model each letter in
a way that distinquished it from all other letters that might be
encountered. I built 11 x 11 pixel models of each vowel and included only
what I considered to be essential pixels in that model. The decision of what
to include in the models of the vowels was made by inspecting the range of
variation seen in each vowel and picking pixels that appeared in those examples.
The trick here was to include only enough of the pixels that were seen
in the examples but leave out those pixels that could distinguish the vowel
in question from other letters with similar constructs.
The models of the vowels:
Model used for the letter a
Model used for the letter e
Model used for the letter i
Model used for the letter o
Model used for the letter u
Model correspondence:
Once the vowel models were built I could slide these models across the thresholded
binary image. This sliding was performed by taking an 11 x 11 pixel patch from the
binary image and then performing a logical AND on the model and the patch. The result of
this AND was itself an 11 x 11 patch. Summing all the 1 bits in this resultant patch was the
measure that I used to determine whether or not the patch being inspected contained a vowel
of the type represented by the model. For this correspondence to be accepted I
required that the result of the logical AND have the same number of 1 bits as
the vowel model used in the comparison.
Fine tuning the result:
It is, of course, possible that the patch being inspected might have all
1 bits as would be the case with a solid black space. Since this did not
occur in the samples being processed, this was not initially seen as a
concern. The assumption was that if there was a 1 bit then it was part of
some character. This did not absolve this approach from all difficulties since,
for instance, a very dark "d" might have 1 bits where the model of the "o" has 1
bits and therefore be initially recognized as an "o". In the case of the "e" and
the "u" however this approach was able to find most of the vowels in the text and no
further refinements were made to the model.
These problems were handled by improving the vowel model. In the case of the "a"
the "i" and the "o" if the AND resulted in an initial correspondence, additional
vowel characteristics were searched investigated. The patch thought to contain an
"i" was searched to see if there was indeed a space between the body and the dot.
This space could be a block of four 1 bits, a block of two 1 bits or one of the two
diagonals that could be made in a block of four 1bits. The patch though to contain
an "a" was checked to make certain that the lower part of the letter was connected
back to the upright part of the letter closing the lower part of the "a". The patch
thought to contain an "o" was checked to make certain that the opening was wide
enough to be an "o" and not a closed "m" or a closed "n". Additionally the "o"
was checked to make certain that it was not attached to an upward stem (making
it a "d" and not an "o") or a downward stem (making it a "p" and not an "o").
The Matlab programs used in the first two segments of scanned text:
The vowel search routine
The text coloring routine
The Results:
The Exercise 1 text before:
The Exercise 1 text after:
The Exercise 2 text before:
The Exercise 2 text after:
How to approach the scrambled text:
I approached the scrambled text by extending the model approach. By inspection
it was determined that none of the letters in the text were rotated into a
position greater than 90 degrees off from their upright normal position.
Instead of having just one model and making comparisons based on that one model,
I built a vector of 36 models by rotating the original model through 180 degrees
in five degree increments. I expected considerable problems with this approach and
my expectations were not dissappointed. When rotating a model in five degree increments
there are many different interpolations that have to be made to arrive back at a 11 x 11
matrix. This turned out to be the downfall of my approach in that it introduced so much noise
that the recognition process that had worked fairly well when the text was upright and
square no longer performed well at all.
What should be done to improve this approach:
I believe that the vowels can be characterized in a way that makes this
recognition routine work fairly well even on the rotated text. The models
that I used were very simple in that they only checked for the occurrence of
a 1 bit. The model should be extended to not only check for the occurrence of
a 1 bit but to also check those places where there should be a 0 bit. I began
to do that when I was checking for spaces
between the dot and the body of an "I" and also when I checked for "stems" on the "o".
What should be done to improve the speed:
This approach was dreadfully slow mainly because it searched each
and every possible 11 x 11 patch of text for possible vowels.
Although this is a methodical and complete approach it is not
a particularly smart approach. Histograms of the columns and rows
can be used to spot possible character positions and thus
avoid needless searching of areas that are too sparsely populated to contain a vowel.
The Matlab programs used in Exercise 3:
The rotated vowel search routine used for Exercise 3
The rotated vowel coloring routine used for Exercise 3
The Results:
The Exercise 3 text before:
The Exercise 3 text after:
Clearly this approach as broken down in a significant way. To better see what is going on
and to separate the multiple recognitions on one vowel I ran the a and the e routines
separately.
Exercise 3 searching for only the letter a alone
Exercise 3 searching for the letter e alone
A final word:
No character recognition routine will be able to do as well
as a human can do when reading text until that recognition
routine is imbued with knowledge of language and knowledge of
the context in which the language is being used. Knowing the language
and the context restricts the possibilities and therefore makes the
recognition much less troublesome. We, as humans, can distinguish between
an "o" and a "u" that has been closed in at the top by the context in which
that letter is used. For example: He took a tomble down the slope. Clearly,
from the context of its use and the knowledge of the language the "o"
in tomble is not an "o" but is instead a "u" that has been closed in.
|
|
| Piedmont College
| Georgia Tech
| ISyE
| CHMSR
|
|
|
Jim Rowan is a Ph.D. student in Georgia Tech"s ISyE Center for Human Machine Systems Research
To contact Jim Rowan by email:
jrowan@cc.gatech.edu