CS 7321 Winter 1998 PS #1 Solutions by
Joe Bayes
Character Recognition Problem
- Convert the grayscale tiff files to matlab 1-bit arrays
- Crop out a sample "e".
- Read in the example text, along with the sample "e",
using imread.
- Convert both images to black and white. First I tried
just using im2bw. I then realized that I needed to
invert the images, so I used ~im2bw. I then did quite
a bit of playing around with the threshold values, in
order to get a clean copy of the e which was not too
distorted.
- Thin the e using the "thin" option under bwmorph.
- Make an image showing only the e's in the original image,
colored in red.
- Get a map of where the e's are by eroding the
test image, using the "thin e" that I made,
above.
- set the colormap to be [1,1,1;0,0,0;1,0,0]
- make the image, showing only the e's, by dilating
the map obtained in step 1 with a bitmap of an
"e".
- Color it red by multiplying the values of the
image by 2.
- Make an image, called "output", which is the array
obtained by taking the maximum of the original image,
and the "e-map" obtained in the previous step.
- Generalize the program to process both the first and the
second example image.
- For the third image, rotate the vowel at 5-degree
increments, putting a properly rotated copy of the
vowel in the output each time a match occurs.
- The first thing to do is to learn how for loops
work.
- I keep a "sum" variable, which starts out pure
black.
- Each time I scan for vowels at a particular
angle, I set sum=max(sum, emap). (emap is the
vowels I found rotated to that angle).
- Once done with the for loop, I continue as
normal.
- Crop out samples of all the other vowels, and generalize
the program. This produces hideous numbers of false
positives.
- Go through each case, and tweak each one to get the best
recognition. This was a pain in the butt...there's got
to be a better way.
to TOP
I made the following assumptions
- That the images were clean enough so that I could easily
tell one letter from another.
I think the major weakness of my solutions are:
- Since I only erode using a template of the letter I'm
looking for, I get matches on many letters that I shouldn't,
because those letters are a "superset", if you will, of the
letters I'm looking for.
- In order to get the solution to work as well as it does,
I had to do trial and error to figure out what thresholds to
use to get each letter to match properly, with not too many
false positives, and not too many false negatives.
- Changing texts would seriously hose this program.
- Rotating the letters only increases the likelihood that
I'll get a false match.
to TOP
I think that this can be improved by doing the following
- Many of my problems came from some letters being
"supersets" of others (for example, a "d" is essentially an
"o", with an "i" on the side). It would improve my solution
tremendously if I could somehow detect that there are light
pixels where there shouldn't be, and eliminate that
letter. One way to do this might be to invert the image and
invert the template, erode, and then invert the result, then
do an "and" on each pixel of this result and each pixel of the
result when I do it the above way. I get the feeling that this
is sort of a shot in the dark, though, so maybe I'll try it at
the end, if I haven't found a better solution by then, and if
I have time.
to TOP
Here is the first example, with only the vowels recognized that I
feel "work well". That would be a, e, and u. (i and o have too much in
common with other characters, and result in too many false positives
to be pretty.)
Next, for completeless, here's what happens when I try to recognize
all the vowels.
Next, the second image. Once again, here are the vowels which sorta
"work". I included o here, since all the other vowels perform about as
well as o does on this image.
And here's the whole shebang on the second image.
The letter that gave me the most trouble in the third example was i,
since it's basically just a straight line, and lots of letters have
straight lines in them at various angles. Even curvy lines look like
straight lines, when you narrow them down to one pixel. Therefore,
here's by best shot at example 3, without the i's.
And here it is with
the i's. In order to recognize the i's, I had to crank it up so it got
a huge number of false positives. I could get rid of
the false positives, but that would get rid of the matches too, and
then it would look like the picture above. Anyway, it's pretty ugly,
but here it is.
Sorry about that nasty formatting up there, but Netscape wouldn't
have it any other way.
Source Code
ps1.m: This processes and displays all
of the images.
Joe Bayes,
jbayes@cc.gatech.edu