Reinforcement Learning: Game Playing


Sponsor Prof. Chris Atkeson
cga@cc.gatech.edu
Area Intelligent Systems

Problem
A long standing dream of Artificial Intelligence is to have a naive computer learn by practicing a task by itself. In game playing this translates into telling a computer the rules of a game, and then letting the computer play itself. After sufficient practice, the computer becomes an expert at that game. This dream has been achieved for backgammon, using reinforcement learning as the learning paradigm and a neural network as the representation (Tesauro, 1992).

In this project you will program a computer to learn from self--play. This will involve the following steps (Please talk to Prof. Atkeson, who can provide you with more info on each of these steps):

  1. Read Tesauro's paper.
  2. Choose a game. Ideally, it should be a game which can be simplified or which allows you to fully enumerate all positions. For example, a simple form of backgammon only has one piece per player.
  3. Choose a representation. Tesauro used a neural network to represent which positions were good and which were bad. You could also use a lookup table, or any other representation you are familiar with.
  4. Choose a learning paradigm. Tesauro used temporal difference learning, and ignored the fact that he had a perfect model of dice rolls. You could use other forms of reinforcement learning, and use more apriori knowledge.
  5. Write a training program, and an evaluation program. You will need to write programs (in the language of your choice) that allow two players to play each other, and to update the representation based on the results.
This is a substantial amount of work, so it is important to start with simple versions of each of these choices, and only get more complicated after some success is achieved. There are many games (chess and go, for example) for which this approach has not yet worked.

Background
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8(3/4), 257-277.