Two Ways To Think About Intelligence
- Knowledge Intensive: Intelligence is all about knowledge.
We have to find out how
to aquire, store, etc..
- Data Intensive: There is a lot of ambiguity in the
world - the world is its own representation. We have to find knowledge in the
world.
Example: Robot Control
Comments
- This is the simplest network possible (there are several optimisations
possible if you change the representation).
- Units of the network are in competition with each other.
- This network works for every linear seperable function.
Problems
- Learning is very slow.
- Requires a teacher.
- Gets stuck in local minima (just like reactive agents).
- There is only little retention.
- Works only for linear seperable functions.
Ways To Get Out Of Local Minima
Genetic Algorithm
Use not only one network, but use several networks with different initial
values and find out which one is close to the global minimum.
Critical assumptions
- It is possible to define a fitness function.
- It is possible to measure the fitness for every individual of the
population.
Basic ideas
- Include the fittest into the next generation.
- Combine some of the fitter ones to make the next generation better.
Questions
- If you have a fitness function, why not just use the function to calculate
the optimum ? - Example: Layout 25 things in a room. We know the optimum
(all 25 things layed out) but we do not have the layout. Try several
layouts and measure how good they are.
Problems
- Which encoding ?
- How to calculate the fitness function ?
- Efficience ?
- It is possible that you will not find the minimum, if the number of
individuals is small and time is limited.
Simulated Annealing
Basic idea
- Introduce some noise into the system (= shake it from time to
time).
Example: Reactive Control With Simulated Annealing
- Goals: Robot picks up remote control and avoids obstacles.
- Sensors: Laser (find remote control), Sonar (find obstacle).
- If robot stops (ex. remote control lies on obstacle), "kick it" and
begin from another start point.
Questions
- Is Simulated Annealing a slower method than Genetic Alogorithm ? -
Every method has lots of parameters, therefore it is very hard to compare
them.
- How do we know when we have reached the global minimum ? (Is it possible
that we will be "kicked out" of the global minimum ?) - The noise function
depends on the time. "Kick more gently as time goes by."
- Why do we need all that stuff ? Teacher based neural networks simulate
the way humans learn. - We do not know if this is the way humans learn.
There are also differences between teachers. If you want to add 8 and
3, a teacher could tell if you did it wrong (8+3=5 -- failure) or he
could tell you the right result (8+3=5 -- failure, 11).
Multilayered Neural Networks
Questions
- Why do we need multiple layers ? - In CS we use modularity, data
encapsulation and abstraction to handle difficult problems. We do exactly
the same. If the problem is too complex: Decompose it !
- Why do we not just add tons of layers ? How do we know when to stop ?
- If we decompose too much, the connections are getting complexer than the
problem.
- Sounds nice, but what do the hidden layers mean ? - We do not always
know what the layers mean (see also Tic-Tac-Toe).
Example: Tic-Tac-Toe Network
CS problems if programming Tic-Tac-Toe
- Rows
- Columns
- Game Over
- Empty Spaces
- Singlet
- Doublet
- ...
Meaning Of The Hidden Layer
This network is quaranteed to play Tic-Tac-Toe perfectly (after training).
Every unit of the layer represents a concept (row, column, ...), but you can
remove a unit and it will still be able to play Tic-Tac-Toe perfectly. If
you remove a unit it is not possible to specify the meaning of the other
units. This phenomen called "distributed representation of concepts"
Questions
- Why do we not use more layers (ie. put the groups of five pieces in
eight layers) ? - Layers depend on each other. In this model the groups do
not depend on each other.
ALWINN (Automatic Steering Of An Automobile)
Basic ideas
Camera (with low resolution 32x30) in front of the car. A driver teaches
the network while driving (ie. network tries to predict actions and receives
feedback from driver). Constant speed is assumed (cruise control).
The network uses sigmoidal instead of binary functions as input and output
(steering wheel & pixel of camera).
Homework
Think about how this works, because the network has only little retention
(ie. it might forget about driving straight while training for driving a
curve).
|