Modular Reinforcement Learning (MRL)

Real-world agents (and agents in interesting artificial worlds) must pursue multiple goals in parallel nearly all of the time. Thus, to make real-world partial programming feasible, we must be able to represent the multiple goals of realistic agents and have a learning system that handles them acceptably well in terms of computation time, optimality, and expressiveness. We are developing a theoretically grounded and practical algorithm to encode the multiple goals of an agent in a way that facilitates true modularity, enabling a true discipline of agent software engineering.

Typical multiple-goal agent formulations decompose an agent into sub-agents with possibly different state spaces (to represent their different concerns) but shared action spaces (to represent that they are part of a single agent executing single actions). Previous work in multiple goal RL has taken the approach of arbitrating the preferences of sub-agents and selecting one of the sub-agents' preferred actions as the "winner" (Sprague and Ballard, 2003). While such an approach is intuitively appealing, we have shown that ideal arbitration satisfying a few reasonable requirements (universality, unanimity, independence of irrelevant attributes, scale invariance, and non-dictatorship) is impossible in general (Bhat, et.al., 2006). We are currently developing a meta-learning algorithm that performs arbitration of sub-agent action preferences. Our work is focusing on relaxing the non-dictatorship property of ideal arbitrators and providing the arbitrator itself with a reward signal. In addition to making arbitration possible, our arbitrator function and its reward signal could encode agent preferences, leaving sub-agents to be coded "selfishly," with their own local reward signals and state abstractions, ignoring any other subgoals an agent might have. This subgoal independence would facilitate transfer and modularity, allowing a subgoal coded for one agent to be reused in another agent.