Two men working with robot

CoRL 2023: Large Language Models Help Humans to Communicate with Robots

Roboticists have a new way to communicate instructions to robots — large language models (LLMs).

Georgia Tech robotics researchers Animesh Garg and Danfei Xu think LLMs like ChatGPT make the best human-to-robot translators when it comes to training robots.

Garg said LLMs have a healthy understanding of human language and can communicate instantly with robots by translating instructions into code. This expedites the training process by weeks and, in some cases, months.

“It’s something that has taken us by surprise,” Garg said. “These models can be applied to many tasks with little or no user input and without retraining. They don’t need specific data for the task that we need them to do. Traditional training methods take a substantial amount of time and data for training.”

For a robot to make coffee, it must understand the steps involved. It also must be familiar with different coffee machines and coffee brands — information that is obtainable on the internet.

“LLMs are going to take the job of high-level reasoning,” Xu said. “They will be part of the decision making about what needs to be done in a setting with all the knowledge a robot has about the world. ChatGPT is trained on the internet, so it knows a lot about what the world should look like and how a normal human would perform a task. It will most likely figure out a reasonable action to take to achieve a goal.”

The researchers emphasized this method is a generalized approach and does not cater to user preferences. Because LLMs tend to be biased, they will do the job, but they will do it on their own terms.

“Bias can play a role in this setting,” Xu said. “It may be more biased toward certain procedures like cooking recipes certain ways. It may be biased toward a certain way to do things instead of the preference of the human working with the robot.”

Danfei Xu, Animesh Garg
From left, Danfei Xu and Animesh Garg are using large language models to train robots to perform assistive tasks like making coffee. Xu and Garg will present multiple papers Nov. 6 to Nov. 9 at the 2023 Conference on Robotic Learning in Atlanta. Photos by Terence Rushin/College of Computing.

Xu said another example would be a robot tasked with tidying a living room. It may need to understand the specific room it is in, but an LLM will only provide a general understanding of a typical living room.

Garg and Xu, assistant professors in the School of Interactive Computing, are long-term collaborators, having worked on the idea since their time together at Stanford and, more recently, Nvidia.

Garg’s and Xu’s work created buzz last year when they published their main body of work on the topic. They co-authored ProgPrompt: Generating Situated Robot Task Plans using Large Language Models with Nvidia Senior Director of Robotics Research Dieter Fox, among others.

Garg said that paper has been cited over a hundred times since publication. He and Xu both worked at Nvidia when LLMs were in early development. They both saw then the impact the models would have on robotics.

The researchers tried to achieve a working communications method between humans and robots using deep learning models. However, Garg said the scope of the work was limited by the models’ lack of prior knowledge of the world. Adding LLMs to the equation supercharged the process.

“Many people have tried many different ideas on how to use this in robotics,” Garg said. “Robotics is converging to the idea that LLMs outputting code is the right thing to do. We have been pushing this idea for many years before LLMs became widely known. Program synthesis with LLMs is a very generalizable and scalable approach to robotics.”

Garg and Xu are presenting multiple papers next week at the 2023 Conference on Robot Learning (CoRL) at the Starling Hotel in Atlanta. The conference spotlights research from emerging junior faculty from academic institutions around the world.

For more information on their papers and presentation times, visit the CoRL website.

Friendly Hacking Helps Robots Boost Defense Strategies
matthew gombolay

Multi-agent robotic communication systems used by first responders, traffic and air-traffic control, and search and rescue teams are getting a security boost thanks to a Georgia Tech research team.

Matthew Gombolay, an assistant professor in the School of Interactive Computing, said all those systems are vulnerable to hacking. And the best way to demonstrate it is by hacking them himself.

Friendly hacking allows researchers like Gombolay to develop new training models with enhanced hacking defense strategies. He believes the best way to build more secure multi-agent communication systems is to predict how an adversary might hack them.

Gombolay and his team of researchers simulate attacks on systems that require multi-agent robot learning (MARL). This helps identify vulnerabilities so researchers can build in safeguards to protect from real-world malicious actors.

“I don’t think anyone has looked at an outside attack in quite the same way we have,” Gombolay said. “If we allow robots access to previous experiences with attackers, they could learn a response strategy. They can adjust their communication scheme or alter their behavior.”

Gombolay said his research focuses on how robot teams might be attacked after they have been deployed instead of the developmental stage. He and his team operate on the premise that an attacker cannot sabotage the learning model used to train the robots.

“We can’t give them bad data, and we don’t have access to the copy of the neural networks controlling these robots,” he said. “Instead, we must figure out how to interfere with their ability to accomplish their mission or be able to control them to work against their mission and do so from the outside in.”