Artificial Intelligence - Lecture from Nov. 13, 2002 on Vision

Scribe: Logan Hauenstein


Levels of Image Processing

If we were to look at the problem of computer vision, it might look like this:


The low-level image starts off as a raw collection of pixels which is expressed in our examples as an array of numbers, each of which represents the intensity of brightness in a gray image. The second step introduces the detection of edges in the raw image. The intermediate level vision starts identifying surfaces and labeling lines. By step 3, we have a 2 1/2-D image which can be stored into memory. The high-level vision is where the actual object recognition occurs. A template for the object is loaded into the working memory from long term memory and can be used to help inform the interpretation of the surfaces in step 3 (and even the edges from step 2).

A simple image may be represented like this:
Sample Image Pixel Values:
6 7 2 2
7 6 2 1
6 6 3 1
7 6 1 3
A graph of the intensity of the columns:


Noise

Noise can make things a little harder to analyze. In the example above, the noise might make it hard to tell that there is an edge between columns 2 and 3. In order to deal with noise, we use a series of convolution operations.
Average Intensity:
Ai = (Ii-1 + Ii + Ii+1) / 3
First Difference:
Fi = (Ai-1 + Ai+1) / 2
Second Difference:
Si = (Fi-1 + Fi+1) / 2

Given the following input, the equations produce graphs that look like this:


Intermediate Level Processing

Assuming that you have already picked out the edges in an image, you can search the memory for template objects that match the observed object's edge map. If you find a template that looks close enough to your observed object, you can infer additional information about the observed object. In the following images, a crude edge map of a house is matched against a template house from memory. The agent can then make assumptions about what the observed house will look like from the side since it knows what the template house looks like from the side.


Searching for a template match in memory using only the edge data could potentially be a very time-consuming process. Another method used is to translate the edge map into a set of surfaces and search for similarly laid-out surfaces in memory. For example, the door part of the crude edge map could be translated into a surface along with the roof surface. We observe that the door surface is below the roof surface. Now we can search through memory for an object that has a surface that looks like the door right below a surface that looks like the roof. This can cut down on the processing power required to recognize objects.

Using the results from the object recognition, we can go back to the previous versions of the image (such as the edge and surface maps) and remove spurious edges and fill in some of the ambiguities.

In what ways is language a harder problem?