Splinter 1

Splinter 1 was the internal name for the first project I worked on in Mike's lab. Mike's lab had a ninja-turtles theme, and after a too much work, I came up with SPLinTER as short for "Simultaneous Planning and Learning in a Tabletop Exploring Robot". It was a manipulation project on our Schunk arm, and had the rather lofty goal of solving generic manipulation tasks using the magic of reinforcement learning. In particular, I had recently read about TD-Gammon, and was excited about the prospect of finding meaningful abstractions in the nodes of a neural network. As a publishable project it was largely a dud, but since it was a lot of fun, and since it led to Splinter 2, I have no regrets.

Here's a blurb I found on my computer which pretty well summarizes what I was thinking at the time:

I chose to focus my research on the domain of object manipulation as a way of striking a balance between simplicity, applicability, and (theoretical) depth. As a research problem, this topic offers a set of challenges which are at the same time clear enough to imagine writing computer programs to solve, relevant enough to generate excitement (and funding!), and deep enough to use as models for exploring fundamental issues in representation and reasoning. For example, I'd like to see my robot to discover for itself the concepts required to open a door, and later to transfer this understanding to related tasks such as opening cabinets, chests, or books. The perceptual and kinematic concepts involved in these tasks are well understood, at least compared to higher cognitive phenomena like social behavior, yet the behaviors remain an elusive goal for roboticists. Thus, my goal (in a nutshell) is to create learning algorithms that are capable of generating structured knowledge representations over sensorimotor primitives through unsupervised exploration. I am currently exploring this possibility with a combination of techniques in reinforcement learning and function approximation, including TD learning and a variant of the venerable artificial neural network.

These issues are still very important to me (enough to switch to a machine learning lab), but I've also come a long way since. My oversight in Splinter 1 was that the gap between low-level control and a high-level task space was too wide to be solved by any known method. I'd initially hoped that the RL model, combined with a neural-network function approximator, would magically bridge this gap, and leave me with a policy for pushing blocks that could be tailored to any reward function I came up with.

In fact what I found was that the approximator tended to diverge or fluctuate in all but the simplest of tasks (see below for one working example). I have since come to understand this phenomenon thanks to the work of one of my lab mates, Peng Zang, who's dissertation was about scaling RL with function approximation. In short, the problem is that function approximators can behave erratically when used in iterative settings as we often have in RL. Even if the target value function is in the hypothesis space of the approximator, the intermediate value functions needed to get there from a random starting point may not be. For further discussion, check out his thesis.


Plotting the ANN value function online during exploration

What you see here is plots of the value landscape at regular intervals as the robot (the simulated blue ball) tries interactions with the block, and receives different rewards. In this case the reward for each action is the change in table cost as evaluated by jvfeatures (see below).

JVfeatures breakdown

What you see here is a series of image decompositions into features that I then evaluate different cost functions on. the bar graph in the lower right represents feature value for each of: entropy (the overall image entropy), scatter cost (scores parallelness/orthogonality of lines in image), ortho cost (scores how orthogonal all lines are to the major axes), area (area of convex polygon surrounding all objects), and finally the total cost (a weighted sum of the previous). The overall intended effect of these terms was to create a cost function that would apply to arbitrary content you might find on a table top.

Message: Could not find directory "pics"

About me

Pic of me