Sihao Hu, Ling Liu, and Tiansheng Huang.

New LLM-based AI Agent Achieves Human Performance Levels in Pokémon

A new creation from Georgia Tech researchers explores the capabilities of large language models (LLMs) with an action-based artificial intelligence (AI) agent through online Pokémon battles. 

The invention, PokéLLMon, uses LLMs with a suite of optimizations to autonomously turn text into actions. It is the first LLM-embodied AI agent that achieves human-parity performance in tactical battle games. 

School of Computer Science Professor Ling Liu and two of her Ph.D. students, Sihao Hu and Tiansheng Huang created PokéLLMon. 

The group says that they chose Pokémon battles as the testbed for two key reasons. First, the game involves many strategies that can easily be translated into text. Next, the win rate of games can be directly measured and used to evaluate the AI agent’s performance. 

Depending on the gameplay and battle strategies used by the opposing human player, the PokeLLMon agent can use three optimization techniques in its battles against human players: 

· In-context reinforcement: With this technique, PokéLLMon uses text-based feedback from battles to refine its strategy and learn what strategies are effective. 

· Knowledge-augmented generation: This optimization uses game knowledge that the team coded into the model to enable just-in-time decision making and combat potential hallucinations of LLMs. 

· Consistent-action generation: When facing a powerful opponent, this optimization empowers the PokéLLMon to take timely and effective actions instead of resorting to panic-induced strategy switching. 

Image
pokelllmon2
In this battle, PokéLLMon exhibited a human-like attrition strategy by poisoning the opposing Pokémon and frequently recovering against the human player.

These strategies enable PokéLLMon to achieve human performance levels in games against human players. 

“Our LLM-embodied AI agent is the first one that leverages LLM to play online PokeMon battles in real time, achieving a win-rate of 49% in ladder competitions, and 56% in invited battles through our testing of over 100 battles," said Liu. 

Hu said that the first version of their PokeLLMon agent can still be tricked by expert human players and some of their attrition strategies. 

The research group will continue their exploration in LLM-embodied AI action agent for online games. One plan involves creating an LLM-enhanced AI agent that can play fully autonomously in open-world video games. 

For more information about PokéLLMon, read the full research paper PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models, or visit the project’s GitHub page.