CS7321 Winter 1998

PS1
Character Recognition

Presented by Jim Rowan


The overall approach:
I approached each part of this project in two phases. The first phase was the character recognition phase. This phase produced a two dimensional matrix for each of the vowels that contained the row and column coordinates of all vowels of that type found. The second phase read these matrices and using the coordinates in the matrix retrieved the 11 x 11 patch from the original thresholded image. This patch was logical ANDed with another patch that had a rectangular grouping of 1 bits roughly positioned where the vowel would be. This resulted a group of 1 bits that were then changed to either 2, 3, 4, 5 or 6 depending on which of the vowels were being colored. By attaching a color map to this file and writing it out as a jpeg a rough coloring of the vowel in question was achieved in that some of the bits that were actually part of the vowel in question were left as 1 bits and therefore displayed as black.
The sample text:
The text sample was scanned at 72 dpi and at that resolution there was very little data to work with. This lack of resolution makes the task of distinguishing an "o" from a "d" from a "p" very difficult. Having said this I must now admit that I own OCR software that runs just fine on a 10 MHz machine. As a result of working on this project I have developed a very healthy respect for that software. In fact I am amazed at how fast the purchased software runs and how miserably slow (in comparison) my approach to the character recognition problem runs on a 180MHz machine.

Thresholding:
To get a binary image the sample had to be thresholded and when an image is thresholded there is a decision to make. Do you threshold at, say, 50% and live with several "e"s that have gaps or do you threshold at, say, 75% and live with "m" and "n that close at the bottom and look like "o"? This is a particularly vexing problem since this first decision affects all processing steps from the thresholding on. I made the decision to threshold at 75% and deal with letters that close where there should be a space in some other manner. It seemed intuitively better to deal with too much data (open letters that closed) than too little data (closed letters that opened).
With the text thresholded I now had a binary image to work with and the question became was there any way to make certain characteristics of the vowels stand out and therefore make them more easily recognized? I worked with thinning, dilating and morphing but had little improvement in one area that wasn"t accompanied by degradation in another area.
My approach was to use a model of the vowel:
I decided to work with the idea that I could model each letter in a way that distinquished it from all other letters that might be encountered. I built 11 x 11 pixel models of each vowel and included only what I considered to be essential pixels in that model. The decision of what to include in the models of the vowels was made by inspecting the range of variation seen in each vowel and picking pixels that appeared in those examples. The trick here was to include only enough of the pixels that were seen in the examples but leave out those pixels that could distinguish the vowel in question from other letters with similar constructs.
The models of the vowels:
Model used for the letter a
Model used for the letter e
Model used for the letter i
Model used for the letter o
Model used for the letter u
Model correspondence:
Once the vowel models were built I could slide these models across the thresholded binary image. This sliding was performed by taking an 11 x 11 pixel patch from the binary image and then performing a logical AND on the model and the patch. The result of this AND was itself an 11 x 11 patch. Summing all the 1 bits in this resultant patch was the measure that I used to determine whether or not the patch being inspected contained a vowel of the type represented by the model. For this correspondence to be accepted I required that the result of the logical AND have the same number of 1 bits as the vowel model used in the comparison.
Fine tuning the result:
It is, of course, possible that the patch being inspected might have all 1 bits as would be the case with a solid black space. Since this did not occur in the samples being processed, this was not initially seen as a concern. The assumption was that if there was a 1 bit then it was part of some character. This did not absolve this approach from all difficulties since, for instance, a very dark "d" might have 1 bits where the model of the "o" has 1 bits and therefore be initially recognized as an "o". In the case of the "e" and the "u" however this approach was able to find most of the vowels in the text and no further refinements were made to the model.
These problems were handled by improving the vowel model. In the case of the "a" the "i" and the "o" if the AND resulted in an initial correspondence, additional vowel characteristics were searched investigated. The patch thought to contain an "i" was searched to see if there was indeed a space between the body and the dot. This space could be a block of four 1 bits, a block of two 1 bits or one of the two diagonals that could be made in a block of four 1bits. The patch though to contain an "a" was checked to make certain that the lower part of the letter was connected back to the upright part of the letter closing the lower part of the "a". The patch thought to contain an "o" was checked to make certain that the opening was wide enough to be an "o" and not a closed "m" or a closed "n". Additionally the "o" was checked to make certain that it was not attached to an upward stem (making it a "d" and not an "o") or a downward stem (making it a "p" and not an "o").
The Matlab programs used in the first two segments of scanned text:
The vowel search routine
The text coloring routine
The Results:
The Exercise 1 text before:
The Exercise 1 text after:
The Exercise 2 text before:
The Exercise 2 text after:
How to approach the scrambled text:
I approached the scrambled text by extending the model approach. By inspection it was determined that none of the letters in the text were rotated into a position greater than 90 degrees off from their upright normal position. Instead of having just one model and making comparisons based on that one model, I built a vector of 36 models by rotating the original model through 180 degrees in five degree increments. I expected considerable problems with this approach and my expectations were not dissappointed. When rotating a model in five degree increments there are many different interpolations that have to be made to arrive back at a 11 x 11 matrix. This turned out to be the downfall of my approach in that it introduced so much noise that the recognition process that had worked fairly well when the text was upright and square no longer performed well at all.
What should be done to improve this approach:
I believe that the vowels can be characterized in a way that makes this recognition routine work fairly well even on the rotated text. The models that I used were very simple in that they only checked for the occurrence of a 1 bit. The model should be extended to not only check for the occurrence of a 1 bit but to also check those places where there should be a 0 bit. I began to do that when I was checking for spaces between the dot and the body of an "I" and also when I checked for "stems" on the "o".
What should be done to improve the speed:
This approach was dreadfully slow mainly because it searched each and every possible 11 x 11 patch of text for possible vowels. Although this is a methodical and complete approach it is not a particularly smart approach. Histograms of the columns and rows can be used to spot possible character positions and thus avoid needless searching of areas that are too sparsely populated to contain a vowel.
The Matlab programs used in Exercise 3:
The rotated vowel search routine used for Exercise 3
The rotated vowel coloring routine used for Exercise 3
The Results:
The Exercise 3 text before:
The Exercise 3 text after:
Clearly this approach as broken down in a significant way. To better see what is going on and to separate the multiple recognitions on one vowel I ran the a and the e routines separately.
Exercise 3 searching for only the letter a alone
Exercise 3 searching for the letter e alone
A final word:
No character recognition routine will be able to do as well as a human can do when reading text until that recognition routine is imbued with knowledge of language and knowledge of the context in which the language is being used. Knowing the language and the context restricts the possibilities and therefore makes the recognition much less troublesome. We, as humans, can distinguish between an "o" and a "u" that has been closed in at the top by the context in which that letter is used. For example: He took a tomble down the slope. Clearly, from the context of its use and the knowledge of the language the "o" in tomble is not an "o" but is instead a "u" that has been closed in.
| | | Piedmont College | Georgia Tech | ISyE | CHMSR | | |

Jim Rowan is a Ph.D. student in Georgia Tech"s ISyE Center for Human Machine Systems Research
To contact Jim Rowan by email: jrowan@cc.gatech.edu