New Algorithm Follows Human Intuition to Make Visual Captioning More Grounded

Wednesday, August 12, 2020

Allie McFadden

College of Computing Machine Learning Center

Annotating and labeling datasets for machine learning problems is an expensive and time-consuming process for computer vision and natural language scientists. However, a new deep learning approach is being used to decode, localize, and reconstruct image and video captions in seconds, making the machine-generated captions more reliable and trustworthy.

To solve this problem, researchers at the Machine Learning Center at Georgia Tech (ML@GT) and Facebook have created the first cyclical algorithm that can be applied to visual captioning models. The model is able to use the three-step processing during training to make the model more visually-grounded without human annotations or introducing additional computations when deployed, saving researchers time and money on their datasets.

Read the full story

https://mlatgt.blog/2020/08/12/new-algorithm-follows-human-intuition-to-make-vi…