CS 7321 Winter 1998

PS#1 Solutions by James Mossman

Character Recognition Problem


Index


How I solved it

Initial analysis of the problem showed that all characters that were to be matched came from the same font style and size. In addition, all the example images where of high quality (good contrast, low noise, uniform lighting, etc). Tradition 'erode' operations used an "all or nothing" approach; one pixel missing from the image could result in a match being missed. Although thinning the vowel kernels would have minimized such misses, these thinned kernels would have had a greatly increased chance of matching non-vowel characters.

Rather than thin the vowel kernels, which would have removed potentially useful information about a vowel's shape, I decided to create a custom eroder that would allow a "theshold" rather than an "all or nothing" approach for finding characters. To keep matrix algebra simple, "percentage overlap" was used as the threshold criteria.

Unexpectedly, the "percentage overlap" solution ran into significant problems identifying the 'e' character. In order to minimize the chance of missing non-'e' characters, the overlap threshold had to be kept fairly low (70%). Unfortunately, such a low threshold resulted in a great deal of false matches. In an attempt to correct the problem, a second criterion was added to determine if a successful match had occured: "openness".

Certain characters, such a 'e' and 'u' have "open" centers: the background pixels can be seen along a line-of-sight which originates from the center of the character. In the case of the 'e', the line-of-sight is a horizontal line moving to the right. The 'u' has a LOS which is vertical. Other characters such as 'o' and 'a' have "closed" centers: the center of the character is completely cutoff from the background pixels.

Adding the "openness" test dramatically cut down on the number of false matches on the letter 'e'. However, when 'e' was rotated prior to eroding, the low 70% threshold once again resulted in an increased number of false matches (although still fewer than those encountered without an "openness" test). Still, the low threshold combined with the right-sided openness description causing the 'c' to often be mistaken for an 'e'.

Significant problems arose when trying to detect the 'i' characters. Because of the characters relatively small size, the 'i' often would "fit" inside of many other letters of the alphabet, such as 'l', 'f', or 'M'. A simple solution was not obvious at first, so the number of false 'i' detections was left alone until the rest of the system had been proven to work. Time constraints have prevented implementation of solutions the 'i' dilemma, but a brief description a two possible solutions can be found in the improvements section.

Detecting vowels in sample image #3 simply required rotating the template image (and corresponding line-of-sight vector) prior to starting the erode operation. However, determining the angles through which to rotate the templates proved to be extremely challenging. This was primarily due to the amount of CPU time required to run the custom 'eroder' code. To further complicate matters, percent overlap thresholds used for horizontally oriented text often required modification depending upon the angle of rotation.

The solution that had hoped to capitalize on the constant font style and size had deteriorated into a horrible kludge of human optimized rotation and threshold parameters. The thresholding simplification had worked fairly well for horizontally oriented text (minus the large number of false 'i' detections), but required what seemed to be an exponentially increasing about of tweaking once characters were rotated.

It is worth noting that the system performed quite well once optimized threshold and rotation parameters were used. However, such a system is completely impractical because very specific knowledge about the text was used when determining optimum parameters.

In summary, this project has demonstrated reasonable accuracy when examining text oriented in a uniform direction. However, once characters are rotated, the simplifications and assumptions made during the design of this system start to fall apart. Although relatively high accuracy can be maintained with rotated characters, the amount of human interaction required for optimization renders this system utterly impractical. to TOP


Assumptions and Weaknesses

I made the following assumptions
  1. identical font style in all images
  2. identical font size in all images
  3. all images have uniform lighting
  4. all images have good contrast
  5. all images have very little noise
  6. all images contain "clean", unbroken characters
  7. all images contain characters rotated by multiples of 5 degrees
I think the major weakness of my solutions are:
  1. Reliance on various threshold parameters determined by several manual interations (including manually specified angles of rotation for characters).
  2. Reliance on a single font style & font size.
  3. Reliance on a very clean/high quality image for good recognition performance.
  4. By coding a custom 'eroder', MATLAB simulation time became prohibitively excessive. This made fine tuning threshold parameters and general experimentation very costly.

to TOP


Improvements and Possible Future Work

I think that this can be improved by doing the following

to TOP


Results

Figure 1: This figure illustrates the accuracy of vowel recognition based on the thresholds used in ex1.m.

Figure 2: This figure illustrates the accuracy of vowel recognition based on the thresholds used in ex2.m.

Upon close examination of Figure 2, overlapping vowel detection can be seen. However, this problem does not exist in Figure 3. This is because of a slightly different method that was used to generate Figure 3. Specifically, Figure 3 generation "removed" vowels for the initial image once a match was made. With characters rotated in various directions, overlapping detection became a very serious issue for Figure 3. However, overlap in Figure 2 was fairly minimal so time was not spent regenerating the figure.

Figure 3: This illustrates that reasonable accuracy can be obtained when additional effort is spent optimizing rotation & thresholding parameters. ex3.m contains the parameters that were used.

to TOP


Source Code