CS 8803B - Artificial Intelligence
Lecture 3, 8/23/2002
Josh Jones
Neural Networks
Recall the stimulus-response (purely reactive) agents described in lecture #2. These agents use sensors to develop an immediate internal representation of the state of the world at a given instant, and compare this state to a table of rules that allows rapid selection of an action.
One question is how to develop these rules, beyond the obvious strategy of hard-coding them.
Neural networks provide a way to develop such a set of rules. In principle, they might be used to develop action selection mechanisms for more complex agents as well, but at this point the technology is not sufficiently advanced for more complex applications.
Induction
Induction can be described as a process of generalizing from specific cases. Provided with a sampling of specific cases, an inductive process can produce a rule that fits the specific cases, and can be applied to cases that have not yet been encountered. Note that the inductive process can sometimes generate incorrect generalizations. In general, the quality of the generalization is related to the completeness and representativeness of the sample used to form the generalization.
Example
Experiment on a circuit, applying a set of voltages and measuring current. Based on a this set of sample readings, generalize to make statements about the current that should result when applying any voltage, i.e. that current varies linearly with voltage.
Neural networks allow agents to learn in an inductive way --- the system is provided with a set of inputs, and is corrected through the use of feedback. Eventually, the system should develop general rules that will allow the consistent production of correct output.
Perceptron
Invented approximately 40 years ago, perceptrons are the initial incarnation of neural networks.
Imagine an agent that moves forward when the numeral 3 is dispalyed on a 5x5 display, and halts whenever any other image is displayed.
How could one design such an agent? Assign 25 inputs units, one to each pixel on the screen. Apply a digital input (0/1) to each input based on the coloration of the relevant pixel (e.g. apply a 1 if the pixel is lit, or a 0 if it is unlit). Then, let an output unit be conected to a motor that controls the agent's motion. The motor runs if the output is 1, and stops if the output is 0. Connect the input units to the output unit, applying an independently adjustable weight to each input's signal.

Initally, the weights on input signals are randomly chosen between 1 and -1. The perceptron is trained by providing a set of inputs and appropriate feedback (i.e. the correct answer). The weights are changed according to this feedback, based on the following rule:

Where K is a constant, the 'learning momentum', T is the feedback which consists
of the correct answer for a given trial, and P is the system's output for that
trial.
The learning momentum is typically chosen as some small value (i.e. < 1). It
is important not to make the learning momentum too large as this can result in
repeatedly overshooting the desired state of the system. Think of the
system as navigating an energy landscape, trying to settle into a state of
minimum energy, where the energy level represents the square of the errors
made by the system, and the horizontal position (distance along the x axis)
represents the sum of the weights on the input signals. A learning momentum
that is too large can cause the system to make large leaps back and forth
across the state of minimum energy, thus preventing effective learning.
Additional Issues
The example we have examined is a fairly limited application. Can this
technique be applied to more complex problems?
The usefulness of perceptrons is limited, as we will see in a moment, but
some additional utility can be added by introducing more than one output.
This could, for instance, allow for the recognition of multiple patterns.
Selecting a large enough and representative enough training sample can be
difficult in some cases as well. Rather than directly engineering the agent's
rules, some engineering effort must go into selecting appropriate training
cases to cause the desired generalizations to be made. In particular, it is
important that the training set include both positive and negative examples.
This type of system requires supervised learning --- i.e. a teacher must be
available to provide consistently correct feedback. This may not be applicable
to all problems. Sometimes this requirement can be mitigated to some extent.
For instance, an automobile driving neural net has been trained by recording
data during sessions with human drivers. The percept/action pairs generated
from these sessions were then used to correct the agent system when it
attempted to drive. However, note that in this case the human generated
recorded data is acting as a teacher --- the requirement for feedback from
an existing intelligent agent (the human) is not removed, only the need for
the immediate attention of the human is relaxed.
Another problem is that training these systems can be quite slow. For instance,
even a simple function such as a binary 'and' typically takes over 4000 trials
to converge.
Linear Separability
Perhaps the most significant drawback to the simple, perceptron-style neural
networks we have examined so far is their inability to cope with problems
that are not linearly separable. Linear separability is the idea that, in the
case of a two dimensional problem, a line can split a diagram illustrating
the solution space into two parts, one of which contains only false results,
and one that contains only true results, as illustrated below.

Note that for problems with more dimensions (inputs) the divider becomes
multidimensional as well (plane, etc). The discovery that simple
perceptron-style neural networks are limited in this way killed research in
the area for some time. Later on, in the 1980's, psychologists discovered
that multilayer networks, combined with a new learning rule, could be more
powerful. Find out how in the next lecture....