Reinforcement Learning: Game Playing
| Sponsor |
Prof. Chris Atkeson
cga@cc.gatech.edu
|
| Area |
Intelligent Systems |
Problem
A long standing dream of Artificial Intelligence is to have a naive
computer learn by practicing a task by itself. In game playing this
translates into telling a computer the rules of a game,
and then letting the computer play itself. After sufficient practice,
the computer becomes an expert at that game.
This dream has been achieved for backgammon,
using reinforcement learning
as the learning paradigm and a neural network as the representation
(Tesauro, 1992).
In this project you will program a computer to learn from self--play.
This will involve the following steps (Please talk to Prof. Atkeson, who
can provide you with more info on each of these steps):
- Read Tesauro's paper.
- Choose a game.
Ideally, it should be a game which can be simplified or which allows you to
fully enumerate all positions.
For example, a simple form of backgammon only has one piece per player.
- Choose a representation.
Tesauro used a neural network to represent which positions were good and which
were bad.
You could also use a lookup table, or any other representation you are
familiar with.
- Choose a learning paradigm.
Tesauro used temporal difference learning, and ignored the fact that he
had a perfect model of dice rolls. You could use other forms of reinforcement
learning, and use more apriori knowledge.
- Write a training program, and an evaluation program.
You will need to write programs (in the language of your choice) that allow
two players to play each other, and to update the representation based on
the results.
This is a substantial amount of work, so it is important to start with simple
versions of each of these choices, and only get more complicated after
some success is achieved. There are many games (chess and go, for example)
for which this approach has not yet worked.
Background
Tesauro, G. (1992). Practical issues in temporal difference learning.
Machine Learning, 8(3/4), 257-277.