Rewards often act as the sole feedback for Reinforcement Learning (RL) problems. This signal is surprisingly powerful. It can motivate agents to solve tasks without any further guidance for how to accomplish them. Nevertheless, rewards do not come for free, and are typically hand-engineered for each problem. Furthermore, rewards are often defined as a function of an agent’s state variables. These components have traditionally been tuned to the domain and include information such as the location of the agent or other objects in the world. The reward function then is inherently based on domain-specific representations. While such reward specifications can be sufficient enough to produce optimal behavior, more complex tasks might be difficult to express in this manner. Suppose a robot has a task of building origami figures. The environment would need to provide a reward each time the robot made a correct figure, thus requiring the program designer to define a notion of correctness for each desired configuration. Constructing a reward function for each model might become tedious and even difficult—-what should the inputs even be?
Humans regularly exploit learning materials outside of the physical realm of a task, be it through diagrams, videos, text, and speech. For example, we might look at an image of a completed origami figure to determine if our own model is correct. This document will describe similar approaches for presenting tasks to agents. In particular, I aim to develop methods for specifying perceptual goals both within and outside of the agent’s environment, and Perceptual Reward Functions (PRFs) that are derived from these goals. This will allow us to represent goals in settings where we can more easily find or construct solutions, without requiring us to modify the reward function when the task changes.
My thesis aims to show that rewards derived from perceptual goal specifications: are easier to specify than task-specific rewards functions; more easily generalize across tasks; and equally enable task completion. You can view my proposal here.
Transferring Agent Behaviors from Videos via Motion GANs
A major bottleneck for developing general reinforcement learning agents is determining rewards that will yield desirable behaviors under various circumstances. We introduce a general mechanism for automatically specifying meaningful behaviors from raw pixels. In particular, we train a generative adversarial network to produce short sub-goals represented through motion templates. We demonstrate that this approach generates visually meaningful behaviors in unknown environments with novel agents and describe how these motions can be used to train reinforcement learning agents.
A paper about this work was accepted into the Deep Reinforcement Learning Symposium at NIPS. You can view a paper on this topic here.
Cross-Domain Perceptual Reward Functions
In reinforcement learning, we often define goals by specifying rewards within desirable states. One problem with this approach is that we typically need to redefine the rewards each time the goal changes, which often requires some understanding of the solution in the agents environment. When humans are learning to complete tasks, we regularly utilize alternative sources that guide our understanding of the problem. Such task representations allow one to specify goals on their own terms, thus providing specifications that can be appropriately interpreted across various environments. This motivates our own work, in which we represent goals in environments that are different from the agents. We introduce Cross-Domain Perceptual Reward (CDPR) functions, learned rewards that represent the visual similarity between an agents state and a cross-domain goal image. We report results for learning the CDPRs with a deep neural network and using them to solve two tasks with deep reinforcement learning.
A paper about this work was accepted at RLDM 2017.
You can view an extended version of the paper here.
Perceptual Reward Functions
Reinforcement learning problems are often described
through rewards that indicate if an agent
has completed some task. This specification can
yield desirable behavior, however many problems
are difficult to specify in this manner, as one often
needs to know the proper configuration for the
agent. When humans are learning to solve tasks,
we often learn from visual instructions composed
of images or videos. Such representations motivate
our development of Perceptual Reward Functions,
which provide a mechanism for creating visual task
descriptions. We show that this approach allows an
agent to learn from rewards that are based on raw
pixels rather than internal parameters.
A paper about this work was accepted at a 2016 IJCAI workshop, Deep Reinforcement Learning: Frontiers and Challenges.
You can view the paper here.
Robust Markov Decision Processes
Reward engineering is the problem of expressing a target task for an agent in the form of rewards for a Markov decision process.
To be useful for learning, it is important that these encodings be robust to structural changes in the underlying domain; that is, the
specification remain unchanged for any domain in some target class. We identify problems that are difficult to express robustly via the
standard model of discounted rewards. In response, we examine the idea of decomposing a reward function into separate components,
each with its own discount factor. We describe a method for finding robust parameters through the concept of task engineering, which
additionally modifies the discount factors. We present a method for optimizing behavior in this setting and show that it could provide
a more robust language than standard approaches.
An extended abstract about this work, Expressing Tasks Robustly via Multiple Discount Factors, was accepted at RLDM 2015. You can view the paper here.