judy hoffman

New Video Dataset Aims to Teach Human Skills to AI Agents

AI models will have a new window to look through to learn everyday human skills, which could produce new tools to teach these skills to humans and robots alike.

A project led by the Fundamental Artificial Intelligence Research (FAIR) team at Meta offers the most extensive video dataset that teaches human skills to artificial intelligence (AI).

Working with Georgia Tech and 14 other university partners, the FAIR team accumulated more than 1,400 hours of video for its Ego-Exo4D data set. The dataset contains video demonstrations of skills such as cooking, applying first aid, playing basketball, and rock climbing from first-person (egocentric) and third-person (exocentric) points of view.

With Ego-Exo4D, an AI agent learns these skills in a way that mirrors human learning.

“We know the foundation of visual learning is to observe others’ behavior from an ‘Exo’ view and map it onto our own actions in an ‘Ego’ view,’” said Kristen Grauman, FAIR research director at Meta. “We’ve created a dataset of simultaneously captured first-person perspective data and third-person video of skilled human activities.”

Assistant Professor Judy Hoffman contributed to the project and is Georgia Tech’s point of contact. She is a faculty member with the School of Interactive Computing and a member of the Machine Learning Center at Georgia Tech.

Hoffman co-led the translation benchmark of Ego-Exo4D and served as an advisor during the development of the corresponding baseline. The translation benchmark generates output in the egocentric view based on the data collected through the exocentric view.

Ego-Exo4D will be publicly available to the AI research community later this month.

For more information on Ego-Exo4D, visit the Meta blog or the Ego-Exo4D website