Zsolt
Kira

General Information

Email:
zkira@gatech.edu
Phone:
404-894-3152
Location - Building:
Coda
Location - Room:
S1181B
Roles:
Professor (any rank)
Primary Unit:
School of Interactive Computing

Details

Degrees with subject and Postdoc Experience:
Degree Type
B.S.
Subject
Computer Engineering and Computer Science (dual)
Year
2002
Institution
University of Miami
Location
Miami, FL
Degree Type
M.S.
Subject
Computer Science
Year
2008
Institution
Georgia Institute of Technology
Location
Atlanta, GA
Degree Type
Ph.D.
Subject
Computer Science
Year
2010
Institution
Georgia Institute of Technology
Location
Atlanta, GA
Statement of Research Interests:

Our work lies at the intersection of machine learning and artificial intelligence for perception and robotics, focusing on generalization and robustness. Recent works include robust finetuning of vision-language models to preserve out-of-distribution generalization capabilities, open-world generalization, long-horizon RL, 3D processing, and fine-tuning of Multi-Modal Foundation Models into Vision-Language-Action models via supervised finetuning, reinforcement learning, and post-training.

Statement of Teaching Interests:

I teach the Deep Learning course, both on-campus and OMSCS. I also teach the Vision-Language Foundation Models course, covering fundamentals and the latest research advancements in multi-modal large language models.

Selection of recent research, scholarly, and creative activities:
M. Andrade, J. Cha, B. Ho, V. Srihari, K. Yadav, and Z. Kira
"Let's Think in Two Steps: Mitigating Agreement Bias in MLLMs with Self-Grounded Verification"
International Conference on Learning Representations (ICLR)
Also appeared in NeurIPS Workshop on Multi-Turn Interactions in Large Language Models (MT-LLM), 2025.
 
S. Halbe, J. Tian, K.J. Joseph, J.S. Smith, K. Stevo, V.N. Balasubramanian, and Z. Kira
"Grounding Descriptions in Images informs Zero-Shot Visual Recognition"
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2026. 

G. Gupta, K. Yadav, Z. Kira, Y. Gal, and R. Aljundi, 
"Memo: Training Memory-Efficient Embodied Agents with Reinforcement Learning"
International Conference on Neural Information Processing Systems (NeurIPS), 2025. 

K. Yadav, Y. Ali, G. Gupta, Y. Gal, Z. Kira
"FindingDory: A Benchmark to Evaluate Memory in Embodied Agents"
NeurIPS Workshop on SPACE in Vision, Language, and Embodied AI (SpaVLE), 2025.

G. Chhablani, X. Ye, R. Grover, M.Z. Irshad, and Z. Kira
"EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device"
International Conference on Computer Vision (ICCV), 2025. 
Also apppeared in CVPR Embodied AI Workshop, 2025.

A. Szot, B. Mazoure, O. Attia, A. Timofeev, H. Agrawal, D. Hjelm, Z. Gan, Z. Kira, and A.T. Toshev
"From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons"
Conference on Computer Vision and Pattern Recognition (CVPR), 2025.

C. Huang*, B. Maneechotesuwan*, S. Chopra, and Z. Kira
"FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering"
Conference on Computer Vision and Pattern Recognition (CVPR), 2025.

C. Huang, J. Tian, B. Maneechotesuwan, S. Chopra, and Z. Kira
"Directional Gradient Projection for Robust Fine-tuning of Foundation Models"
International Conference on Learning Representations (ICLR), 2025.