CS 7321 Winter 1998

PS#1 Solutions by George Magiros

(Please debit my days lates account)

Character Recognition Problem

Index


How I solved it

  1. First I was confused. With all the talk of eroding to get the character, this didn't make sense to me.

    Why? Well what if there is noise and then the erode template doesn't match ALL the points. Of course you could apply some type of generating function to remove that noise. But I thought that would remove the information that I needed. So. First I examined the character and made a template of them

    1. Which required making templates for all the characters: a, e, i, o, and u.

      Which wasn't hard because the first image ps1-ex1.tiff had little noise and i could figure out the original character pretty well.

    2. I hand crafted the template very thin. I removed the thickness of the template characters. Hoping that the character would match in a several places. So if noise prevented a match in one configuration, hopefully another closeby one would work also.

    3. I added another feature to handle noise: Its like eroding but instead of matching every erodee with my template. I made the match conditional on if a certain number of matches were made, only then would "erosion" occur, that is when a match would occure.

  2. So I tried out my routines.

    1. It got as many wrong answers as one would think. M's think they are u's. g's think they are e's.

    2. Indeed this type of pattern matching is tough. There is no constaints on position of the characters, or that i assumed. Well I wanted it to handle the general case of rotated characters. So I couldn't use a projection initially to determine where a line of characters begins and thus constrain my search to a couple of lines.

    3. Also since I didn't have all the character in the font. I couldn't make say a list of every character, and see which match is the best of all of them. That is rank all potential conclusions that a character might be. Say, a 'e' might be more likely than a 'a' even though they both match; so I would say 'e' because it got the higher matching score.

    4. Otherwise I have to figure out by hand which characters are most like to be confused with others. And thus do the least likely confused first. Thus this is one way to separate the likely to be confused from the other characters.

    5. But character matching for a _known_ font like this should not involve such deliberation. What if the template could be replaced with a percentage template telling us what the percentage that a certain pixel should be on. A score could be generated this way and then compared to others to figure the best character. (Sorry i'm repeating myself) This is one plus for not removing the gray from a scanned font image. At the very least the binary image should be used as a mask to get a gray scale representation of the scanned characters.

  3. Complexity

    1. In addition, besides the conditional template I spoke of previously. I also included flags in the templates to indicate to the program specific unique features. There is a flag used for indicating when certain pixels _must_ be present. And another flag indication when certain pixels should not be present.

  4. Least I forget how I thresholded

    1. To accomplish the transition from the gray scale image to binary image, I applied some matlab algebra. And it seems to work pretty well. The image appears well filtered in affect. I apply a histogram on the data and then curve fit a third order polynomial to it. It seems a good way to find the center of those two peaks.

    2. From there I threshold. Though there are parts where the thresholding isn't that great. Specifically, where there should be a gap of white space between characters, there isn't.

to TOP


Assumptions and Weaknesses

Hopefully I explained these above.

But specifically one weakness is the lack of constraints on the data. I wish I could define where a line starts. Also the lack of the full font character set is a weakness. Also my program is slow as hell. I should of used bwmorph but I need to learn more about matlab so I can conquer it, not it me.

One major assumption that I made was that the background was going to be white and the character black. Otherwise my algebraic solution to the thresholding problem will be wrong: since I alway pick the greater of the two roots of the derivative of the histogram polynomial.

As for strength: there is the freedom implicit in scanning everything. Though a lot of times its wrong. Especially when the characters to be match might be turned upside and every other way.

to TOP


Improvements and Possible Future Work

I think that this can be improved by doing the following

Since the character font used is constant, more analysis should be done of all the character in the fonts to find relations between them. Because what I'm doing at present seems more relavent to the general OCR problem not this specific font one.

to TOP


Results

Figure 1: the original

Figure 2: looking for: a

Figure 3: looking for: e

Figure 4: looking for: i

Figure 5: looking for: o

Figure 6: looking for: u (Hey, nothing there!)


Now for the second lot:


Results

Figure 7: the original

Figure 8: looking for: a

Figure 9: looking for: e

Figure 10: looking for: i

Figure 11: looking for: o

Figure 12: looking for: u (Note the error in mountain)


Now for the last image. Note I'm only putting up the one for looking for e's since my program takes so (so) long. For instance I left at 8pm and the program was only 1/4th done at 11pm.

Image 3

Figure 13: The original

Figure 14: looking for: e

to TOP


Source Code