CS 6660 – Intelligent Agents

Class Notes – 9/24/2001

Taken by Joey Bokor

 

Labeling Problem

 

 

 

 

 

 

 

 

 

 


             Raw Image                       Primal Sketch                            2½-D                                    3-D

Transition involves aggregation and abstraction.  It is very numerical, top-down and parallel.

 

Transition involves labeling surfaces.

 
 

 

 

 

 

 


Why do we label?  To begin deriving meaning and because labeling enables inferences.

 

What is meaning? Understanding?

Three possibilities:

  1. Discrimination between sensory stimuli.

Consider a reactive robot whose task is to separate red objects from blue objects.  As objects are presented, it separates them into piles of blue and red objects.  This robot is discriminating.

2.      Relationships between stimuli and stored memory.

Word analogy game.  Someone says a word, and you say the first thing that comes to mind.  The stimulus is evoking other memories.

3.      Draw correct inferences,

All three of these meanings relate to each other, and a scenario cast into one definition can usually be interpreted as either of the other two.  However, different areas of AI tend to use different meanings of meaning and understanding, with robotics using the first, cognitive science using the second and classic AI using the third.

 

Vision example of labeling

 


Consider the following image:

 

 

 

 

 

 

 

 

 

 

 


Why do we think it is a cube?  Because our perceptual apparatus allows us to label the three surfaces.

 

In order to label we need categories.  But what categories do we need and how do we assign the categories?

In addition to categories, we need knowledge and some process for using the knowledge.

 

As we look at the image, we recognize that edge ab appears to connect two surfaces and edge cb borders a surface and a ‘non-visible’ surface.

From this we get our categories:

·        Folds – connect two surfaces, both visible.  Examples – ab, bd.

·        Blades – connect two surfaces, one visible. Example – cd, ac.

 

We also have a set of junctions (knowledge) that we can use in a process of ‘constraint satisfaction’ to attempt labeling of the edges.

 

Junctions and their possible combinations with blades and folds:

b

 

       f

 

f

 

      b

 

b

 

      b

 

                              

 

 

 

 


b    f     b

 

f     f     f

 

 

 

 


 b           b

     

       f

      

 

 b           b

     

       b

       

 

   f        f

 

 

       f

 

   b       b

 

 

       f

 
 

 

 

 

 

 

 

 

 

 

 

 

 


If we make the assumption that all edges against the background are blades, we can start at a vertex and using the knowledge of junctions and the process of constraint satisfaction, we can begin to label the edges of the image.  Suppose that we start with vertex a.  Immediately we notice that it is like the ‘W’ junction, which implies that ab must be a fold (since we already assumed that ae and ac are blades).  Continuing around the vertices, we could label all of the edges and would end up labeling ae, ef, fg, ac, cd, dg as blades and ab, bd and bf.  If we had not assumed that the edges against the background were blades, we could have completed the labeling by performing an exhaustive search starting at any of the vertices.

 

This labeling process also assumes that we can easily recognize edges and discern the background.  In reality, we probably have spurious edges and noise in the image. 

Some techniques for tackling these problems:

·        Use context to create a hypothesis and attempt to remove spurious edges based on the hypothesis.

·        Average close edges to form one edge

·        Use the salience of edges to remove spurious ones.

 

Model-based Vision or Knowledge-Based Vision.  This is involved in the transition from 2½-D to 3-D. 

 

What is a model? 

Example – F = ma or a scale model of a building.

 

In a model we have parameters.  The model may have a subset of the parameters from reality.  For example, F = ma does not include a parameter for friction, however in reality it is a parameter. 

 

A representation stands for something, however a representation need not be a model.

.  .  .