PhD Qualifier

Chris Simpkins

December 3, 2008

A2BL: Advanced Agent Programming for All

Motivation

Adaptive Software, Norvig and Cohn, 1998

Some problems not well suited to traditional programming. Another way to think of programming: adaptive agents.

Adaptive Software via Partial Programming

Partial Programming via Reinforcement Learning

From ABL to A2BL

Why a new language?

Why a new language, rather than a library?

An API that achieved our goals would end up looking an awful lot like a language anyway.

How Does A2BL Improve on ABL

Towards Adaptive Programming: Integrating Reinforcement Learning into a Programming Language. Christopher Simpkins, Sooraj Bhat, Charles Isbell, Jr., and Michael Mateas. OOPSLA '08: ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, Onward! Track, Nashville, TN USA, October 2008

Example: The Furry Creature World

A Furry Creature Agent in ABL

A Furry Creature Agent in ABL

A Furry Creature Agent in ABL

A Furry Creature Agent in ABL

Solution: Adaptive Programming

Reinforcement Learning

The Furry Creature World as an RL Problem

A Furry Creature in A2BL

A Furry Creature in A2BL

A Furry Creature in A2BL

A Furry Creature in A2BL

A Furry Creature in A2BL

How Our A2BL Agent Would Behave

Advantages of A2BL's Built-in Adaptivity

Summary of A2BL as a PL/SE Issue

What have I done?

Why is this research?

A2BL Future Directions and Research Issues

ICML 200?: Hydra: Arbitration for Modular Reinforcement Learning

Other Stuff: Text Mining for Scientometrics

Using Content Analysis to Investigate The Research Paths Chosen by Scientists Over Time. Chiara Franzoni, Christopher Simpkins, Li Baoli and Ashwin Ram. 11th International Conference of the International Society for Scientometrics and Informetrics (ISSI 2007)

Other Stuff: MIMIC for Antenna Design

Thanks!

It's been an honor. Thank you for your time and support.

OOP in A2BL

ICML 200?: Hydra: Arbitration for Modular Reinforcement Learning

Multiple-Goal Reinforcement Learning (1 of 2)

Multiple-Goal Reinforcement Learning (2 of 2)

Modular Reinforcement Learning, Bhat, et. al., 2006

CHARLES

Hydra-Q: Learning to Combine Sub-Agent Preferences

Research Philosophy

Career Plan/Grand Vision

Written Exam, Machine Learning #2

RL Decomposition

Written Exam, Machine Learning #2

Function Approximation

Written Exam, Machine Learning #2

Can function approximation work well with problem decomposition?

Written Exam, Planning and Search #1

Compare Classical Planning with Markov Decision Process (MDP) Planning

  1. Is there any advantage to Classical Planning (logic-based) as compared to MDPs? List any advantages you find.

    • Classical planners can exploit domain knowledge better, becuase it can be encoded in forst-order represenataions, for example, that constrain the state transitions that need to be considered.
  2. Describe a planning domain where the advantage of classical planning is apparent.

    • Route planning in a public transit system. Here, you have deterministic state transitions, and a small number of simple actions that have straightforward preconditions and effects.

Written Exam, Planning and Search #1

  1. Show this advantage by representing your domain as both a logical domain and an MDP.

    • Classical planner uses a simple determnitic domain theory. MDP only has states, and transitions between states labelled with probabilities. MDP starts with assumption that any state can result in any other state for some action with some probability. MDP would end up looking like
  2. Prove that the classical planner has better performance.

Written Exam, Planning and Search #1

  1. When do MDPs have an advantage (in general, not just this domain)? List at least two advantages and explain.

    • MDPs have an advantage in stochastic environments.

    • MDPs handle exogenous events, that is, state transitions not under the control of the agent.

  2. Can we get the best of both worlds? How?

Written Exam, Planning and Search #3

  1. Describe the circumstances and domains under which partial-order planners will dominate other approaches.

    • Graphplan and SAT approaches use conjuctions of boolean literals representing the entire problem state at every step. This limits the domains they can express and can require massive amounts of storage when there are several such literals. For such domains, like logistics problems with dozens of locations and objects, partial-order planners using first-order representations do much better.

Written Exam, Planning and Search #3

  1. Some believe that traditional metrics such as speed, ratio of problems solved, and plan length ignore critical properties of planners. What other criteria could be considered? What features of partial-order planning algorithms, and partial-order plan representations might be considered advantageous over other approaches when considering these new criteria?

    • Decomposability and paralellizability. Partial-order plan representations exploit the inherent parallelism in problems.