CS 7321 Winter 1998

PS#1 Solutions by Ivan Brusic

Character Recognition Problem


Index


How I solved it

    Finding the vowels in the test consisted of using the morpholigical operators of erosing and dilation to extract the vowels.  I used the vowels themseleves as structuring elements.  The process of finding the vowels was actually very simple.  The text image was eroded with a thinned version of the vowel.   This found all the position where theoretically the letter is to be found.  I then dilated the eroded image with the orignal "thick" vowel to get all the desired letters from the image test.

    The main problem is that the morpholigical operators are performed on binary images, while the image text is gray scale.  The first, logical step was to then threshold the image.  Finding the ideal value required numerous experiements.  The problem was augmented by the fact that we must also threshold the structuring elements.  The vowels, or structuring elements, all performed well with a theshold of 0.4.  The repeated test were very tedious and can be continued even more to find better values.

    After the thresholding operations, all images were inverted to fit the logic of the program.  Erode and dilate work with pixel of values of 1.  A pixel with a 1 value however has the color of white.  Since our "objects" are black, we must invert the images so the letters are white on a black background.  This process also was neccessary to perfrom the "thin" operation on the structuring elements.

    Performing these simple operations achieved good results for the letters a, e, o, and u.  The letter i was very difficult to detect.  Since the dimension of the letter i can fit into other letters, multiple wrong hits occured when searching for the letter i.  These letters included l, L, p, M, d and more.  A very simple techniques discussed in class was used to find the dimension of the letter i and to excluded anything from the image that is longer than it.  By testing different values of the letter, I determined that the average length of the base was 9 pixels.  I then eleminated from the image, vertical lines with the length of 12 or more.  This drastically improved the result.  It eliminated all conflicts with the "long" letters but it did not fix others such as the "fr" combination.

    The last step was to find the vowels in an image with the letters at angle other then 0 degress from the base.  This was accomplished by rotating the structing element by 5 degree intervals between 0 and 360 degrees.  The resluts for every interpolation of the vowel was OR'ed with the value of the prevoius iteration.

    As explained above, the process was simplistic, but finding the correct values, cropped vowels made the problem quite long.

to TOP


Assumptions and Weaknesses

I made the following assumptions
  1. I assumed that eroding with a structuring element, that anything that does not match exactly the structuring  element will be deleted.  This includes "pixels" that are 1 or 0.  I learned after severel test and rereading the text that this does not occur.  Only 1 pixels are matched.  Aagghh!
  2. My main idea to eliminate wrong "hits" was to match the element hit with the structuring element.  This consisted of erode the thinned region of the hit image with the thick structuring element.  I hoped to find hits that were wrong.  My assumption was that one can extract a matrix and compare it with the strucuring element.  I could not find one case were this did work.  What are major dissappointment.
  3. Another assumption was that Matlab would rotate images perfectly.  I initially did not visually examine how the rotated structuring element looks like.  I discovered that they were severly distorted.  I partially overcame this obstacle by using a different interpolation method.
  4. Four
I think the major weakness of my solutions are:
  1. My solution only applies to the images we have avaiable to us.  While the procedure will possibly be the same, the hard coded thresholding values are a definate weakness.  Using the exact properties of the image, I was able to find the height of the typical letter i.  This made it possible to eliminate all occurence of pixel sequence that were longer that it (see above in How I solved it).
  2. Speed and time are taken for granted.  The solution to this problem is definetly slow.  This process would not be effective for a continous stream of images, or images of a larger size.

to TOP


Improvements and Possible Future Work

There is definetly much room for improvement.  The major areas of improvement must come in reducing the number of wrong hits and in speed.  This can be done by:

to TOP


Results

The color code:
A's are red  (255,0,0)
E's are green (0,255,0)
I's are blue (0,0,255)
O's are voilet (255,0,255)
U's are cyan (0,255,255)
Image 1

Figure 1: This the result of finding the vowels on the first image.  The result is perfect, there are no missing vowels or additional hits.  Every letter is of their respeective color.

Image 2

Figure 2:  In this image, problems begin to occur.  The number of vowels is low: only 3.  There is one a, one e, and one 1 missing, all on the first line.  The main problem is the number of extraneous letters found to be vowels.  Some common errors are that "fr" looks like an i and the letter m that is simliar to the letter o.  The error of overlapping has also apperead.  Some letters appear to be both E's and O's.

Image 3

Figure 3: Here is where most thing break down.  Due to the differnce in algorithm, the number of overlapping vowels has greatly increased.  The vowel misses only occur for the letter e.

to TOP


Source Code