Researchers Discover New Solution to Improve AI Memory
It’s been said that artificial intelligence (AI) agents have short memories.
When an AI model is trained on new information, it’s not uncommon for it to forget most of what it already knows.
A discovery by a Georgia Tech Ph.D. student, however, could unlock a new generation of AI agents capable of avoiding this catastrophic forgetting.
These agents would operate and stay up to date through continual learning — a method that allows models to learn from new data and perform new tasks over time without retraining.
This means optimized, low-cost model performance for AI companies, and it means more engaging interactions with AI agents for users.
AI companies currently retrain models because of frequent catastrophic forgetting. The AI agent forgets what it has learned about a task when new data is introduced, and its performance declines.
“You can think of it as the model’s current knowledge overrides the previous one,” said Zekun Wang, a Ph.D. student in Georgia Tech’s School of Interactive Computing. “Sometimes you have mild forgetting, and that’s ok, but catastrophic means it completely overrides what it has already learned. What it once was good at doing, it is now awful at it.”
In a new paper he will present at the 2026 International Conference on Learning Representations (ICLR), Wang asserts that it’s possible to avoid catastrophic forgetting in diffusion models.
Diffusion models are generative AI models used to create high-quality images and videos. Wang discovered and mathematically proved that diffusion models inherently remember important data from previous training.
When Wang and his advisor, Associate Professor Christopher MacLellan, applied what Wang calls the Rank-1 Fisher method to diffusion models, they observed a sharp decline in catastrophic forgetting.
How Catastrophic Forgetting Happens
When new data feeds into an AI model, the model prioritizes and assigns weights to the information it deems most important to perform the desired task.
Catastrophic forgetting occurs when the model blindly updates its weights based on prior data when it receives new data. The data needed to perform the previous tasks has now been deemed less important or irrelevant.
Engineers have traditionally used elastic weight consolidation (EWC) to prevent models from forgetting old tasks by penalizing changes to important parameters. Wang said the more often the model needs updates, the less likely this method will succeed.
EWC must be informed by accurate Fisher information for it to be effective. Fisher information is named after Sir Ronald Alymer Fisher, who is considered the father of modern statistical science.
“Fisher information intuitively computes the importance of weights relative to the previous task,” Wang said. “You don’t want the weights that are important to the previous task to change at all, but when you introduce new data for new tasks, they do.”
The goal is to get the model to assign a Rank-1 Fisher value to all data critical to performing the desired tasks.
Wang is the first researcher to claim that diffusion models are unique because they naturally produce a Rank-1 approximation. This counters the traditional method of estimating the value through what Wang refers to as a diagonal Fisher.
Wang said there have been no prior research publications, to his knowledge, that make that claim.
He said this discovery means continual learning is a matter of copying the model each time it receives a training update.
“We keep a copy of the model in its memory, and that dramatically brings down the computational costs,” he said. “It means we can use a cheap formula to faithfully recover the weight importance of prior tasks.”
Retraining Costs
Wang said it’s still unclear how his theory would apply to other AI models like transformer-based large language models, but it could be a significant step toward universal continual learning.
MacLellan said this would have a wide impact on AI companies racing to develop more efficient AI technologies.
“It can cost millions of dollars to train a model on a set of data, which is what the frontier AI companies are doing right now,” MacLellan said.
“Our hope with continual learning is that you can take the model and update it with the next task without having to retrain and without forgetting. The cost of updating the model each time will be less than the cost of retraining it from scratch.”