OCR Assignment
(Due Date EXTENDED TO Tuesday, April 21, Noon)

Write a program that does printed character recognition for scanned pages of the textbook (pages chosen to be "simple").

Extra Credit: Write a program that can handle more complex pages, and different fonts and character sizes.

The input to your program will be a binary image file in the pbm format (man pbm or www documentation). If it becomes clear that there are more convenient image formats we will use those as well. Please feel free to suggest formats.

The output should be the ascii text corresponding to the page, with corresponding punctuation. We need to be able to run your program on "test" pages we provide during the grading process.

A writeup describing how you approached this problem and how your program works is a critical part of what you turn in.

Example text (I have decided to scan in at high resolution, 600dpi):

Some more pages to work on:

See example for more helpful hints. This example program runs on UNIX systems with an X window system. To compile it use this Makefile which results in the following command:
cc -O -I/usr/local/X11/include/ example.c -o example -lX11 -lm

Relevant web pages:

I encourage you to work together on this assignment, and to try to find publicly available code on the net. If you submit such code or base your solution on other's code, make sure to clearly indicate which parts of the code are not original and their source. There is no limit on how many people can work together as a group, but it is important that each member of the group participate fully.

Instructions on how to turn this in will be announced.