New Algorithm Follows Human Intuition to Make Visual Captioning More Grounded
Annotating and labeling datasets for machine learning problems is an expensive and time-consuming process for computer vision and natural language scientists. However, a new deep learning approach is being used to decode, localize, and reconstruct image and video captions in seconds, making the machine-generated captions more reliable and trustworthy.
To solve this problem, researchers at the Machine Learning Center at Georgia Tech (ML@GT) and Facebook have created the first cyclical algorithm that can be applied to visual captioning models. The model is able to use the three-step processing during training to make the model more visually-grounded without human annotations or introducing additional computations when deployed, saving researchers time and money on their datasets.
Read the full storyWe are thrilled to announce Vivek Sarkar as the new Dean of the College of Computing at Georgia Tech! With a distinguished career spanning academia and industry, Sarkar's leadership promises to elevate our community to new heights. https://t.co/2mX5D46cJz pic.twitter.com/LxpLTCXWZV
— Georgia Tech Computing (@gtcomputing) April 12, 2024
@GeorgiaTech's dedication to excellence in computer science (CS) has been recognized once again, with the latest U.S. News and World Report rankings unveiling the institution at 7th place overall for graduate CS studies.https://t.co/qavNUSTb7n pic.twitter.com/BcGyGBQld8
— Georgia Tech Computing (@gtcomputing) April 10, 2024