Like a parent teaching their child the basics of how to swing a tennis racket, Georgia Tech robotics student Rohan Paleja stands behind the robotic Ping Pong paddle-wielding arm and deliberately moves it forward at the small white ball being fired across the table.
In a less-than-perfect demonstration, Paleja helps the robot connect with the ball and send it across the net to the other side. A few more demonstrations render results of varying qualities – a few fly off the table and a couple land with just a hint of backspin.
It’s the best Paleja can do. For one, he’s no Fan Zhendong, gold medalist at the 2020 International Table Tennis Federation World Cup. And even if he was, his ability to swing the robot’s arm in a proper motion is nothing next to his ability to swing his own.
And yet moments later, with no further demonstration, the robotic arm on its own swings the paddle forward in a perfect cutting motion, placing enough backspin on the ball that it shoots across the net, hits once, and checks up in the other direction – a perfect shot.
It’s a fun bit of a research – teaching a robot to automatically improve its table tennis performance from a suboptimal human demonstration – but it could pave the way for vital advancements to the future of robotics, from health care and elder care to home assistants, defense, space exploration, and more.
“We want to be able to put robots in the hands of end users who might not have extensive computer science training but want to be able to teach it to help perform novel skills in their everyday lives,” said Matthew Gombolay, an assistant professor in the School of Interactive Computing and faculty lead on the research. “This could be something as simple as helping someone fold their clothes or perform other tasks in the home all the way up to robot-assisted surgery.”
The challenge is that humans aren’t experts at everything they do, but they’d still like the robot to be able to perform better than the demonstrations they can provide. The team of researchers, which includes Paleja, Gombolay, and Ph.D. student Zac Chen, used a specific brand of machine learning called “adversarial inverse reinforcement learning.” It’s a bit of a mouthful, but the approach is fairly straightforward.
First, you have a person show a robot how to perform a task. In this case, it was Paleja swinging the arm forward to hit a Ping Pong ball across a net. The robot will take one or more of those demonstrations and just try to do what is called “behavior cloning” – repeating the action the human just demonstrated.
Now, Chen will introduce an inhibitor to the motion – a jerk in the motion that causes degradation in the robot’s ability to perform the task. They will add more and more of this “noise” to the robot’s motion, causing it to become more unstable.
“We assume there is some relationship between the quality of the robot’s ability to swing and how much noise we’re adding to the robot’s motion,” Chen said. “The more noise you add, the worse the robot should perform.”
The result is that you now have a Sigmoid function that can predict the relationship between noise and performance. This kind of mathematical function is known for its characteristic S-shaped curve with a typically increasing value as conditions change. It’s a relationship that continues if you apply “negative noise” to the function.
“You can’t literally add ‘negative noise’ to the motion, but the robot essentially infers what the reward would be if you did,” Gombolay explained. “We train the robot controller to basically have more and more negative noise, allowing it to continue to improve upon the human’s performance without ever ask the person what ‘better’ looks like.”
Gombolay envisions applications across a range of robotics fields, from helping humans perform tasks in the home to entering into dangerous disaster areas in place of humans to performing tasks on other planets.
There are still challenges to overcome. Now, it may take a few hours for the robot to master the assigned task, but Gombolay would like to decrease that to a few minutes.
“I think you can do that as our algorithms get better,” he said. “This robot was only ever taught this one thing. But what if it had already learned other tasks that it could then apply to this that could help it learn faster – like swinging a baseball bat, for example?”
This research was published in a paper titled Learning from Suboptimal Demonstration via Self-Supervised Reward Regression (Chen, Paleja, Gombolay).