CS 7321 Winter 1998
PS#1 Solutions by Ivan Brusic
Character Recognition Problem
Index
How I solved it
Finding the vowels in the test consisted of using the
morpholigical operators of erosing and dilation to extract the vowels.
I used the vowels themseleves as structuring elements. The process
of finding the vowels was actually very simple. The text image was
eroded with a thinned version of the vowel. This found all
the position where theoretically the letter is to be found. I then
dilated the eroded image with the orignal "thick" vowel to get all the
desired letters from the image test.
The main problem is that the morpholigical operators
are performed on binary images, while the image text is gray scale.
The first, logical step was to then threshold the image. Finding
the ideal value required numerous experiements. The problem was augmented
by the fact that we must also threshold the structuring elements.
The vowels, or structuring elements, all performed well with a theshold
of 0.4. The repeated test were very tedious and can be continued
even more to find better values.
After the thresholding operations, all images were
inverted to fit the logic of the program. Erode and dilate work with
pixel of values of 1. A pixel with a 1 value however has the color
of white. Since our "objects" are black, we must invert the images
so the letters are white on a black background. This process also
was neccessary to perfrom the "thin" operation on the structuring elements.
Performing these simple operations achieved good
results for the letters a, e, o, and u. The letter i was very difficult
to detect. Since the dimension of the letter i can fit into other
letters, multiple wrong hits occured when searching for the letter i.
These letters included l, L, p, M, d and more. A very simple techniques
discussed in class was used to find the dimension of the letter i and to
excluded anything from the image that is longer than it. By testing
different values of the letter, I determined that the average length of
the base was 9 pixels. I then eleminated from the image, vertical
lines with the length of 12 or more. This drastically improved the
result. It eliminated all conflicts with the "long" letters but it
did not fix others such as the "fr" combination.
The last step was to find the vowels in an image
with the letters at angle other then 0 degress from the base. This
was accomplished by rotating the structing element by 5 degree intervals
between 0 and 360 degrees. The resluts for every interpolation of
the vowel was OR'ed with the value of the prevoius iteration.
As explained above, the process was simplistic, but
finding the correct values, cropped vowels made the problem quite long.
to TOP
Assumptions and Weaknesses
I made the following assumptions
-
I assumed that eroding with a structuring element, that anything that does
not match exactly the structuring element will be deleted.
This includes "pixels" that are 1 or 0. I learned after severel test
and rereading the text that this does not occur. Only 1 pixels are
matched. Aagghh!
-
My main idea to eliminate wrong "hits" was to match the element hit with
the structuring element. This consisted of erode the thinned region
of the hit image with the thick structuring element. I hoped to find
hits that were wrong. My assumption was that one can extract a matrix
and compare it with the strucuring element. I could not find one
case were this did work. What are major dissappointment.
-
Another assumption was that Matlab would rotate images perfectly.
I initially did not visually examine how the rotated structuring element
looks like. I discovered that they were severly distorted.
I partially overcame this obstacle by using a different interpolation method.
-
Four
I think the major weakness of my solutions are:
-
My solution only applies to the images we have avaiable to us. While
the procedure will possibly be the same, the hard coded thresholding values
are a definate weakness. Using the exact properties of the image,
I was able to find the height of the typical letter i. This made
it possible to eliminate all occurence of pixel sequence that were longer
that it (see above in How I solved it).
-
Speed and time are taken for granted. The solution to this problem
is definetly slow. This process would not be effective for a continous
stream of images, or images of a larger size.
Improvements and Possible Future Work
There is definetly much room for improvement. The major areas of
improvement must come in reducing the number of wrong hits and in speed.
This can be done by:
-
Determining what elements give problems and eliminating them. I did
this partially by removing all vertical lines of thickness 1 pixel and
of height 12 pixels. One instance is the similarity between e and
o. We can search first for one, then eliminating them before seraching
for the second.
-
Using different structuring elements for the same vowel. We can eliminate
wrong hits if they occur only with a minority of the structuring elements
used.
-
More analysis can be done on what thresholding values are ideal.
As one can see by my values, different letters perform better with different
thresholds. For example the letter u works well with a threshold
of 0.4 while a performs better with a value double that at 0.8.
Results
The color code:
A's are red (255,0,0)
E's are green (0,255,0)
I's are blue (0,0,255)
O's are voilet (255,0,255)
U's are cyan (0,255,255)
Figure 1: This the result of finding the vowels on the first image.
The result is perfect, there are no missing vowels or additional hits.
Every letter is of their respeective color.
Figure 2: In this image, problems begin to occur. The number
of vowels is low: only 3. There is one a, one e, and one 1 missing,
all on the first line. The main problem is the number of extraneous
letters found to be vowels. Some common errors are that "fr" looks
like an i and the letter m that is simliar to the letter o. The error
of overlapping has also apperead. Some letters appear to be both
E's and O's.
Figure 3: Here is where most thing break down. Due to the differnce
in algorithm, the number of overlapping vowels has greatly increased.
The vowel misses only occur for the letter e.
Source Code
-
script.m : this is the script showing how
the problem solution was arrived at
-
find_vowel.m: the function vowel.m
used to extract vowels
-
find_rotate.m: a more effiencient
find for the rotated vowels
-
vwrotate.m: the rotating function
-
remove_long: the function used the long vertical
line segments