We present a new approach for transfer of dy- namic robot control policies such as biped locomotion from simulation to real hardware. Key to our approach is to perform system identification of the model parameters μ of the hardware (e.g. friction, center-of-mass) in two distinct stages, before policy learning (pre-sysID) and after policy learning (post-sysID). Pre-sysID begins by collecting trajectories from the physical hardware based on a set of generic motion sequences. Because the trajectories may not be related to the task of interest, pre- sysID does not attempt to accurately identify the true value of μ, but only to approximate the range of μ to guide the policy learning. Next, a Projected Universal Policy (PUP) is created by simultaneously training a network that projects μ to a low-dimensional latent variable η and a family of policies that are conditioned on η. The second round of system identification (post-sysID) is then carried out by deploying the PUP on the robot hardware using task-relevant trajectories. We use Bayesian Optimization to determine the values for η that optimize the performance of PUP on the real hardware. We have used this approach to create three successful biped locomotion controllers (walk forward, walk backwards, walk sideways) on the Darwin OP2 robot.
Illustration of locomotion policies deployed on the Darwin OP2 robot. Top: walk forward. Middle: walk backward. Bottom: walk sideways.
This material is based in part upon work supported by the National Science Foundation under grant IIS-1514258. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Go to Greg Turk's Home Page.