Learning from Observation Using Primitives

Darrin Bentivegna

Georgia Institute of Technology, College of Computing, Atlanta, GA

Advanced Telecommunications Research Institute International (ATR-I), Human Information Science Laboratories, CyberHuman Project, Kyoto, Japan

 

---

Introduction

Observing a task being performed or attempted by someone else often accelerates human learning. If robots can be programmed to use such observations to accelerate learning their usability and functionality will be increased and programming and learning time will be decreased. This research explores the use of task primitives in robot learning from observation. A framework has been developed that uses observed data to initially learn a task and then the agent goes on to increase its performance through repeated task performance (learning from practice). Data that is collected while a human performs a task is parsed into small parts of the task called primitives. Modules are created for each primitive type that encodes the movements required during the performance of the primitive, and when and where the primitives are performed. The feasibility of this method is currently being tested with agents that learn to play a virtual and an actual air hockey game.  The term robot and agent are used interchangeably to refer to an algorithm that senses its environment and has the ability to control objects in either a hardware or software domain.

 

---

Observing the Task

The task to be performed must first be observed. For a human learner this mostly involves vision. In order for the robot to learn from observing a task being performed it must have some way to sense what is occurring in the environment.  This research does not seek to find ways to use the robot's current sensors to observe performance. The agents will be given whatever equipment is necessary to observe the performance or be given information that represents the performance. The equipment may include a camera or some type of motion capture device. Research is also being performed in virtual environments and the state of objects is directly available from the simulation algorithm.

 

---

Primitives

Robots typically must generate commands to all their actuators at regular intervals. The analog controllers for our 30-degree of freedom humanoid robot are given desired torques for each joint at 420Hz. Thus, a task with a one second duration is parameterized with 30X420=12600 parameters. Learning in this high dimensional space can be quite slow or can fail totally. Random search in such a space is hopeless. In addition, since robot movements take place in real time, learning approaches that require more than hundreds of practice movements are often not feasible. Special purpose techniques have been developed to deal with this problem, such as trajectory learning and learning from observation.

It is our hope that primitives can be used to reduce the dimensionality of the learning problem. Primitives are solutions to small parts of a task that can be combined to complete the task. A solution to a task may be made up of many primitives. In the air hockey environment, for example, there may be primitives for hitting the puck, capturing the puck, and defending the goal.

The above figure shows our view of a primitive. Currently, a human, using domain knowledge, designs the candidate primitive types that are to be used. The primitive recognition module segments the observed behavior into the chosen primitives. This segmented data is then used to provide the encoding for the primitive selection, sub-goal generation, and action generation modules. The primitive selection module provides the agent with the primitive to use for the observed state of the environment. After a primitive type to use has been chosen, the sub-goal generation module specifies the desired outcome, or goal, of that primitive.  Lastly the actuators must be moved to obtain the desired outcome. The action generation module provides the actuator commands needed to execute the chosen primitive type with the current goal.

After the agent has obtained initial training from observing human performance, it should then increase its skill at that task through practice. Up to this point the agent's only high-level goal is to perform like the teacher. Its only encoding of the goal of the entire task is in the implicit encoding in the primitives performed. The learning from execution module contains the information needed to evaluate the performance of each of the modules toward obtaining a high-level task objective. This information is then used to update the modules and improve their performance, possibly beyond that of the teacher.

---

Domains Being Explored

Learning from observation research is currently being performed on a variety of domains. A grid-world maze, an air- hockey game, and a marble maze in virtual environments are being explored. A hardware version of the marble maze is also being explored. These domains were chosen because of the ease with which they can be simulated in virtual environments and provide a starting point to obtain more information on learning from demonstration. They can easily be created on a computer and played using a mouse. They are also small enough to be operated in a laboratory. Since the basic movements in these domains are only in two dimensions, motion capture and object manipulation is simplified. A camera based motion capture system can easily be used to collect data in a hardware implementation of air hockey and the marble maze. A stationary arm or some other similar robotic device can be programmed to play air-hockey on an actual table.

 

  Grid-world Maze.

The grid-world maze consists of a virtual robot in a maze. The robot is put in a starting position and must find its way through the maze to the goal position. Reinforcement learning is used in this domain.  The software was created with MVC++ and uses the Tcl/Tk library. 

This environment uses a very straightforward Q-learning algorithm. We are using it to explore techniques in which observed data can be incorporated into this algorithm to decrease learning time.  The robot decides on the action to perform by looking at the values of the next possible actions that can be taken from the current state.  The value of a state/action pair, Q(s,a), is the future discounted reward that the agent can expect to receive by taking action a from state s.  Some examples of state/action pairs would be ((1,1), down) and ((1,3), up).  The goal of the agent is to reach the goal in the shortest amount of steps.  The agent receives a reward of -1 for each step that is taken.  The value of the goal state is 0.  The values are updated each time a move is made using the following function.


The learning rate controls the amount that the state/action value is changed at each step.  The discount rate takes into account that rewards that can be received soon are more valuable than equivalent ones that can be received much later in the process. 

 

  Air-hockey, AVI movie (1.2MB)

A cyber air hockey game was created that can be played on any computer that supports OpenInventor and Tcl/TK. The game consists of two paddles, a puck and a board to play on.  A human player using a mouse controls one paddle.  At the other end is a cyber-human. 

The following primitives are currently being explored:

-         Left Bank Shot –the player hits the puck, the puck hits the left wall once and then travels toward the goal.

-         Straight Shot – the player hits the puck, the puck travels straight toward the goal without hitting a wall.

-         Right Bank Shot – the player hits the puck, the puck hits the right wall once and then travels toward the goal.

-         Block – the player does not make a shot but attempts to block the puck from entering the player’s goal area.

-         Setup – the player is positioning their paddle in preparation to make a shot.

-         Multi-shot – the player has blocked or made a shot and the puck does not have enough velocity to return to the other side of the board.  Therefore the player has the opportunity to make another shot.

 

 

This research is also being conducted in a hardware version.

The humanoid robot DB. MPEG of hockey playing. (14MB)

The onboard cameras and a vision system that locates colored objects in the image are used to observe the state of the environment.  This image shows the four corners of the board and the puck as seen by the vision system.

 

 

 Labyrinth (Marble Maze)

 

Maze outfitted with motors, encoders, sensors, a camera and vision processor. Computer playing MPEG (3.6MB).

Software Labyrinth game. Human Play AVI (3.8MB)

 

Learning using primitives is also being explored in the Labyrinth environment in software and on hardware.  As a human plays the game the board and ball positions are recorded.   Primitives are extracted from this data. The following primitives are currently being explored:

-         Wall Roll Stop – The ball rolls along a wall and stops when it is in a corner.

-         Roll Off Wall – The ball rolls along a wall and then rolls off the end of the wall.

-         Roll From Wall – The ball is on a wall and then is maneuvered off it.

-         No Wall – The ball is guided from on location to another without touching a wall.

-         Corner – The ball is in a corner and the board is position in preparation to move the ball from the corner.

 

Wall Roll Stop

Roll Off Wall

Roll From Wall

 

No Wall

Corner

 

 

---

Publications

Humanoid Robot Learning and Game Playing Using PC-Based Vision, Darrin C. Bentivegna, Ales Ude, Christopher G. Atkeson, and Gordon Cheng.  Presented at IROS 2002, Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland, October, 2002.

Learning How to Behave from Observing Others, Darrin C. Bentivegna and Christopher G. Atkeson.  Presented at the SAB'02-Workshop on Motor Control in Humans and Robots: on the interplay of real brains and artificial devices, Edinburgh, UK, August, 2002.

A Framework for Learning From Observation Using Primitives, Darrin C. Bentivegna and Christopher G. Atkeson.  Presented at the Symposium of Robocup 2002, Fukuoka, Japan, June, 2002.

Learning From Observation Using Primitives, Darrin C. Bentivegna and Christopher G. Atkeson.  Presented at ICRA 2001 in Seoul, Korea, May 2001.

Using Primitives in Learning From Observation, Darrin C. Bentivegna and Christopher G. Atkeson.  Presented at Humanoids 2000 in Boston, Mass. September 2000.

Testbeds Used for Exploring Learning from Observation, Darrin C. Bentivegna and Christopher G. Atkeson.  Published in the proceedings of the workshop for the AAAI2000 Robot Competition and Exhibition.

Using Primitives in Learning from Observation: A Preliminary Report, Darrin C. Bentivegna and Christopher G. Atkeson.  Published in the proceedings of the workshop of the Eighth AAAI Mobile Robot Competition and Exhibition held at AAAI99.

---

Learning and Robot Links

Local Learning from Chris Atkeson  

Robot Books

Rich Sutton's Home Page

Robotics FAQ Table of Contents

Robotics Internet Resources Page

Robotics Research in Japan

Honda Humanoid

 

---

Contact Information

Darrin Bentivegna

dbent@cc.gatech.edu

Home Page

Georgia Institute of Technology

Atlanta, GA 30080