CS 7321 Winter 1998

PS#1 Solutions by Gabriel J. Brostow

Character Recognition Problem


Index


How I solved it

I started by displaying the first image, and copying a subimage out of it which contined the letter 'e'. Using that subimage as the structuring element, I proceeded to erode the entire image, hoping this would leave single marked pixels wherever the original image had an 'e'. Since this only eroded very few e's, I AND'ed two sub-imaged e's together and "thinned" (using bwmorph) the resulting structuring element.

Now, whenever I eroded with my e-structuring element, I was left (usually accurately) with a dot (pixel = on) marking all the occurences of 'e.' By dilating this image using one of the original 'e' sub-images, I generated an image which looked like the original, but the areas not covered by e's were zeroed out. Finally, I superimposed this on the originally thresholded image using a colorful entry in the colortable - making the e's stand out.

I then proceeded to apply this same methodology to the other vowels. I succeeded (for the most part), but only after some significant enhancements. The most substantial improvement came when I had the script locate and zero-out all letters [g, l, m, f]. Without these changes, the sript would identify g's as two e's. It would also "find" i's in the l's and f's (and d's), while the m's just caused general confusion, depending on which letters it was neighboring. A further improvement in the vowel-detector's accuracy came when I zero's out all known letters as they were identified. This way, each section of the script left the image in question with fewer active pixels to identify (or confuse).

The last sections of the program deal with cases where the letters are rotated. I look for vowels here by rotating the structuring element by various degree-increments (determined experimentally as optimal), and otherwise processing the images and subimages in the same manner as before.

to TOP


Assumptions and Weaknesses

I made the following assumptions
  1. I am dealing with only these three images.
  2. I am allowed to use sample data - in the form of human-segmented vowels which serve as templates for finding the rest.
I think the major weakness of my solutions are:
  1. Non-scalable: I don't have any mechanisms to deal with other images - especially if they don't have exactly the same font as these three.
  2. No mechanism to intelligently extract vowels automatically - I have to pick out 2 or three manually which must be representative.
  3. All-or-nothing: Grayscale information is ignored, and when one pixel doesn't match - no statistical comparison is made.
  4. No robustness for rotated letters.

to TOP


Improvements and Possible Future Work

I think that this can be improved by doing the following

to TOP


Results

Image 1

Figure 1: Simplest Case: Perfect results for all vowels. Each vowel marked with different color.

Image 2

Figure 2: Same algorithm, but some errors occurred due to variations in letters' appearcance due to thresholding of aliasing and the quality of the scanned original. "tr" combo was mistakenly recognized as 'u', and and some smudged m's were partially tagged as o's.

Image 3

Figure 3: Rotated vowels were recognized - but very inaccurately. Order of processing (for each vowel) is significant here because rotated letters look like other non-rotated letters. "i" is particularly unlocatable.

to TOP


Source Code