AI Falls Short on 'Simple' Chemistry Tasks, New Study Finds
A new testing framework designed to evaluate AI's chemistry skills has revealed gaps in what today's most powerful models can and cannot do.
A multidisciplinary team of chemists and computer scientists, including School of Computer Science Professor Ling Liu, has developed a benchmark to evaluate how well artificial intelligence (AI) models perform chemistry-related tasks. Liu served as an AI expert on the project.
The team’s paper, MolLangBench: A Comprehensive Benchmark for Language-Prompted Molecular Structure Recognition, Editing, and Generation, details its test results using popular large language models (LLMs)including GPT-4o, DeepSeek-R1, and GPT-5.
The team focused on three core tasks in developing its benchmark. These tasks include:
- Molecular structure recognition
- Editing
- Generation
These fundamental tasks are considered the building blocks for more complicated chemical problems. Chemists currently accomplish these tasks using automated cheminformatics tools.
As one of two computer scientists who contributed to the paper, Liu helped train the models using a comprehensive chemistry data set. This fine-tuning provided the LLMs with the information needed to solve the problems. The chemistry researchers then offered guidance on how useful the models’ results could be to chemists.
GPT-5 was the best-performing model among those tested, but even it struggled with tasks that are considered “intuitively simple” for humans. On recognition problems, the model achieved 86.2% accuracy, but only 43% for molecular generation tasks.
It was important for the researchers to make the dataset and code used in MolLangBench open source so that others can build on their work. The team hopes that this gesture will help to encourage the development of reliable AI tools for scientists.
“The ultimate goal is to have models generate something that can help chemists produce more innovative scientific discoveries,” Liu said.
The paper will be presented at the Fourteenth International Conference on Learning Representations (ICLR 2026), held April 23-27 at the Riocentro Convention and Event Center in Rio de Janeiro, Brazil.
For complete coverage of Georgia Tech at ICLR 2026, visit https://sites.gatech.edu/research/iclr-2026/.