CS 7321 Winter 1998 PS #1 Solutions by Joe Bayes

Character Recognition Problem


Index

How I solved it

  1. Convert the grayscale tiff files to matlab 1-bit arrays
    1. Crop out a sample "e".
    2. Read in the example text, along with the sample "e", using imread.
    3. Convert both images to black and white. First I tried just using im2bw. I then realized that I needed to invert the images, so I used ~im2bw. I then did quite a bit of playing around with the threshold values, in order to get a clean copy of the e which was not too distorted.
    4. Thin the e using the "thin" option under bwmorph.
  2. Make an image showing only the e's in the original image, colored in red.
    1. Get a map of where the e's are by eroding the test image, using the "thin e" that I made, above.
    2. set the colormap to be [1,1,1;0,0,0;1,0,0]
    3. make the image, showing only the e's, by dilating the map obtained in step 1 with a bitmap of an "e".
    4. Color it red by multiplying the values of the image by 2.
  3. Make an image, called "output", which is the array obtained by taking the maximum of the original image, and the "e-map" obtained in the previous step.
  4. Generalize the program to process both the first and the second example image.
  5. For the third image, rotate the vowel at 5-degree increments, putting a properly rotated copy of the vowel in the output each time a match occurs.
    1. The first thing to do is to learn how for loops work.
    2. I keep a "sum" variable, which starts out pure black.
    3. Each time I scan for vowels at a particular angle, I set sum=max(sum, emap). (emap is the vowels I found rotated to that angle).
    4. Once done with the for loop, I continue as normal.
  6. Crop out samples of all the other vowels, and generalize the program. This produces hideous numbers of false positives.
  7. Go through each case, and tweak each one to get the best recognition. This was a pain in the butt...there's got to be a better way.

to TOP


Assumptions and Weaknesses

I made the following assumptions

  1. That the images were clean enough so that I could easily tell one letter from another.

I think the major weakness of my solutions are:

  1. Since I only erode using a template of the letter I'm looking for, I get matches on many letters that I shouldn't, because those letters are a "superset", if you will, of the letters I'm looking for.
  2. In order to get the solution to work as well as it does, I had to do trial and error to figure out what thresholds to use to get each letter to match properly, with not too many false positives, and not too many false negatives.
  3. Changing texts would seriously hose this program.
  4. Rotating the letters only increases the likelihood that I'll get a false match.

to TOP


Improvements and Possible Future Work

I think that this can be improved by doing the following

to TOP


Results

Image 1 Here is the first example, with only the vowels recognized that I feel "work well". That would be a, e, and u. (i and o have too much in common with other characters, and result in too many false positives to be pretty.)

Image 2 Next, for completeless, here's what happens when I try to recognize all the vowels.

Image 3 Next, the second image. Once again, here are the vowels which sorta "work". I included o here, since all the other vowels perform about as well as o does on this image.


















Image 4 And here's the whole shebang on the second image.




















Image 5 The letter that gave me the most trouble in the third example was i, since it's basically just a straight line, and lots of letters have straight lines in them at various angles. Even curvy lines look like straight lines, when you narrow them down to one pixel. Therefore, here's by best shot at example 3, without the i's.

Image 6 And here it is with the i's. In order to recognize the i's, I had to crank it up so it got a huge number of false positives. I could get rid of the false positives, but that would get rid of the matches too, and then it would look like the picture above. Anyway, it's pretty ugly, but here it is.

Sorry about that nasty formatting up there, but Netscape wouldn't have it any other way.

to TOP


Source Code

ps1.m: This processes and displays all of the images.


Joe Bayes, jbayes@cc.gatech.edu