Stefan Lee
About
I am a Research Scientist II in the School of Interactive Computing at Georgia Tech collaborating with Dhruv Batra and Devi Parikh. My research focus is the development of agents that can perceive their environment and communicate about this understanding with humans in order to coordinate their actions to achieve mutual goals -- in short, agents that can see, talk, and act. Consequentially, I work on problems in computer vision, natural language processing, and deep learning in general.

Career

August 2017
Research Scientist II @ Georgia Tech - School of Interactive Computing
August 2016
Bradley Postdoctoral Associate @ Virginia Tech - Bradley Department
of Electrical and Computer Engineering
July 2016
Earned PhD @ Indiana University - School of Informatics & Computing
Thesis: Data-Driven Computer Vision for Science and the Humanities
July 2013
Earned MS @ Indiana University - School of Informatics & Computing
Recent Press Coverage
- Facebook helped create an AI scavenger hunt that could lead to the first useful home robots - MIT Technology Review 2018
- How A Virtual Scavenger Hunt Could Train Robots To Find Things In Your Home - FastCompany 2018
- Facebook is training AI to answer questions like humans do -- Digital Journal 2018
- Research Scientist, Assistant Professor Represent IC in DARPA Risers Event - ML@GT Blog 2018
- What is Graph R-CNN? - ML@GT Blog 2018
- Choose Your Neuron: Incorporating Domain Knowledge through Neuron-Importance - ML@GT Blog 2018
- Embodied Question Answering - ML@GT Blog 2018
Teaching
Georgia Tech
CS8903 - Special Problems (Fall 2017 - Present)
(Faculty Advisor)

Virginia Tech
EECE 5424/4425 CS 5824/4824 - Introduction to Machine Learning (Fall 2016)
(Instructor)

Indiana University
B659 - Image Processing and Recognition (Fall 2014)
(Assistant Instructor)
I399 - Research Methods for Informatics and Computing (Fall 2013)
(Graduate Mentor)
C211 - Introduction to Computer Science (Fall 2011 - Summer 2012)
(Assistant Instructor)
Publications

2018

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee

NIPS 2018

@inproceedings{advregvqa_nips_2018,
  title={Overcoming Language Priors in Visual Question Answering
with Adversarial Regularization},
  author={Sainandan Ramakrishnan and Aishwarya Agrawal and Stefan Lee},
  booktitle={Neural Information Processing Systems (NIPS)},
  year={2018}
}

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

CORL 2018

Oral
@inproceedings{viscur_corl_2018,
  title={Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition},
  author={Jianwei Yang and Jiasen Lu and Stefan Lee and Dhruv Batra and Devi Parikh},
  booktitle={Conference on Robot Learning (CORL)},
  year={2018}
}

Neural Modular Control for Embodied Question Answering

Abhishek Das, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

CORL 2018

@inproceedings{eqamodular_corl_2018,
  title={Neural Modular Control for Embodied
Question Answering},
  author={Abhishek Das and Georgia Gkioxari and Stefan Lee and Devi Parikh and Dhruv Batra},
  booktitle={Conference on Robot Learning (CORL)},
  year={2018}
}

Choose Your Neuron: Incorporating Domain Knowledge through Neuron Importance

Ramprasaath R. Selvaraju, Prithvijit Chattopadhyay, Mohamed Elhoseiny, Tilak Sharma, Dhruv Batra, Devi Parikh, Stefan Lee

ECCV 2018

@inproceedings{niwt_eccv_2018,
  title={Choose Your Neuron: Incorporating Domain Knowledge through Neuron Importance},
  author={Ramprasaath R. Selvaraju and Prithvijit Chattopadhyay and Mohamed Elhoseiny and Tilak Sharma and Dhruv Batra and Devi Parikh and Stefan Lee.},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2018}
}

Graph R-CNN for Scene Graph Generation

Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, Devi Parikh

ECCV 2018

@inproceedings{grcnn_eccv_2018,
  title={Graph R-CNN for Scene Graph Generation},
  author={Jianwei Yang and Jiasen Lu and Stefan Lee and Dhruv Batra and Devi Parikh},
  booktitle={European Conference on Computer Vision (ECCV)},
  year={2018}
}

Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations

Ashwin K Vijayakumar, Stefan Lee, Anitha Kannan, and Dhruv Batra

ICML 2018

Oral (Long)
@inproceedings{multiprob_icml_2018,
  title={Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations},
  author={Ashwin K Vijayakumar and Stefan Lee and Anitha Kannan and Dhruv Batra},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2018}
}

Embodied Question Answering

Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra

CVPR 2018

Oral
@inproceedings{embodiedqa,
  title={{E}mbodied {Q}uestion {A}nswering},
  author={Abhishek Das and Samyak Datta and Georgia Gkioxari and Stefan Lee and Devi Parikh and Dhruv Batra},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018}
}

Diverse Beam Search for Improved Description of Complex Scenes

Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra

AAAI 2018

@article{vijayakumar_complex_2018,
  author = {Ashwin K. Vijayakumar and Michael Cogswell and Ramprasath R. Selvaraju and Qing Sun and Stefan Lee and David J. Crandall and Dhruv Batra},
  title = {Diverse Beam Search for Improved Description of Complex Scenes},
  journal = {AAAI Conference on Artificial Intelligence (AAAI)},
  year = {2018}}
2017

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

Abhishek Das*, Satwik Kottur*, José M.F. Moura, Stefan Lee, and Dhruv Batra

ICCV 2017

Oral
@article{visdial_rl,
  title = {Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning},
  author = {Abhishek Das and Satwik Kottur and Jos\'e M.F. Moura andStefan Lee and Dhruv Batra},
  journal = {International Conference on Computer Vision (ICCV)},
  year = {2017}}

Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog

Satwik Kottur, José M.F. Moura, Stefan Lee, and Dhruv Batra

EMNLP 2017

Oral / Best Short Paper Award
@article{kottur_emnlp_2017,
  title = {Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog},
  author = {Satwik Kottur and Jos{\'{e}} M. F. Moura and Stefan Lee and Dhruv Batra},
  journal = {Conference on Empirical Methods in Natural Language Processing},
  year = {2017}}

The Promise of Premise: Harnessing Question Premises in Visual Question Answering

Aroma Mahendru*, Viraj Prabhu*, Akrit Mohapatra*, Dhruv Batra, and Stefan Lee

EMNLP 2017

@article{mahendru_emnlp_2017,
  author    = {Aroma Mahendru and Viraj Prabhu and Akrit Mohapatra and Dhruv Batra and Stefan Lee},
  title     = {The Promise of Premise: Harnessing Question Premises in Visual Question Answering},
  journal   = {Conference on Empirical Methods in Natural Language Processing},
  year      = {2017}}

Evaluating Visual Dialog Agents via Cooperative Human-AI Games

Viraj Prabhu, Prithvijit Chattopadhyay, Deshraj Yadav, Arjun Chandrasekaran, Abhishek Das, Stefan Lee, Dhruv Batra, and Devi Parikh

HCOMP 2017

@article{visdial_eval_hcomp_2017,
	title={Evaluating Visual Conversational Agents via Cooperative Human-AI Games},
	author={Prithvijit Chattopadhyay and Deshraj Yadav and Viraj Prabhu and Arjun Chandrasekaran and Abhishek Das and Stefan Lee and Dhruv Batra and Devi Parikh},
	booktitle={Proceedings of the Fifth AAAI Conference on Human Computation and Crowdsourcing (HCOMP)},
	year={2017}}

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

Qing Sun, Stefan Lee, and Dhruv Batra

CVPR 2017

@article{sun_cvpr_2017,
  title = {Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning},
  author = {Qing Sun and Stefan Lee and Dhruv Batra},
  journal = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2017}}

2016

Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles

Stefan Lee, Senthil Purushwalkam, Michael Cogswell,Viresh Ranjan, David J. Crandall, and Dhruv Batra

NIPS 2016

@article{lee_nips_2016,
  author = {Stefan Lee and Senthil Purushwalkam and Michael Cogswell and Viresh Ranjan and David J. Crandall and Dhruv Batra},
  title = {Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles},
  journal = {Conference and Workshop on Neural Information Processing Systems (NIPS)},
  year = {2016}}

Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

Ashwin K Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, and Dhruv Batra

arXiv 2016

@article{vijayakumar_dbs_2016,
  author = {Ashwin K. Vijayakumar and Michael Cogswell and Ramprasath R. Selvaraju and Qing Sun and Stefan Lee and David J. Crandall and Dhruv Batra},
  title = {Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models},
  journal = {CoRR},
  volume = {abs/1610.02424},
  year = {2016}}

2015

Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions

Sven Bambach, Stefan Lee, David Crandall, and Chen Yu

ICCV 2015

@article{egohands2015iccv, 
    title = {Lending A Hand: Detecting Hands and Recognizing Activities in Complex Egocentric Interactions},
    author = {Sven Bambach and Stefan Lee and David Crandall and Chen Yu},
    journal = {IEEE International Conference on Computer Vision (ICCV)},
    year = {2015}}

Linking Past to Present: Discovering Style in Two Centuries of Architecture

Stefan Lee, Nicolas Maisonneuve, David Crandall, Josef Sivic, and Alexei A. Efros

ICCP 2015

@article{lee_linking_2015, 
    title = {Linking Past to Present: Discovering Style in Two Centuries of Architecture},
    author = {Stefan Lee and Nicolas Maisonneuve and David Crandall and Josef Sivic and Alexei A. Efros},
    journal = {IEEE International Conference on Computational Photography (ICCP)},
    year = {2015}}

Predicting Geo-informative Attributes in Large-scale Image Collections using Convolutional Neural Networks

Stefan Lee, Haipeng Zhang, and David Crandall

WACV 2015

@article{lee_geoattributes_2015, 
    title = {Predicting Geo-informative Attributes in Large-scale Image Collections using Convolutional Neural Networks},
    author = {Stefan Lee and Haipeng Zhang and David Crandall},
    journal = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
    year = {2015}}

Detecting and Classifying Hands in Social and Driving Contexts

Sven Bambach, Stefan Lee, David Crandall, and Chen Yu

VIVA Challenge and Workshop, IEEE Intelligent Vehicles Symposium 2015

@article{bambach_viva_2015, 
    title = {Detecting and Classifying Hands in Social and Driving Contexts},
    author = {Sven Bambach and Stefan Lee and David Crandall and Chen Yu},
    journal = {Vision for Intelligent Vehicles and Applications (VIVA) Challenge and Workshop, IEEE Intelligent Vehicles Symposium},
    year = {2015}}

Tracking Hands of Interacting People in Egocentric Video

Sven Bambach, Stefan Lee, David Crandall, John Franchak, and Chen Yu

Workshop on Observing and Understanding Hands in Action, CVPR 2015

@article{bambach_track_2015, 
    title = {Tracking Hands of Interacting People in Egocentric Video},
    author = {Sven Bambach and Stefan Lee and David Crandall and John Franchak and Chen Yu},
    journal = {Workshop on Observing and Understanding Hands in Action, IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2015}}

2014

This Hand Is My Hand: A Probabilistic Approach to Hand Disambiguation in Egocentric Video

Stefan Lee, Sven Bambach, David Crandall, John Franchak, and Chen Yu

3rd Workshop on Egocentric Vision, CVPR 2014

Best Paper Award
@article{bambach_egohand_2014, 
    title = {This Hand Is My Hand: A Probabilistic Approach to Hand Disambiguation in Egocentric Video},
    author = {Sven Bambach and Stefan Lee and David Crandall and John Franchak and Chen Yu},
    journal = {3rd Workshop on Egocentric Vision, IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2014}}

Estimating Bedrock and Surface Layer Boundaries And Confidence Intervals In Ice Sheet Radar Imagery Using MCMC

Stefan Lee, Jerome Mitchell, David Crandall, and Geoffery Fox

ICIP 2014

@article{lee_icemcmc_2014, 
    title = {Estimating Bedrock and Surface Layer Boundaries And Confidence Intervals In Ice Sheet Radar Imagery Using MCMC},
    author = {Stefan Lee and Jerome Mitchell and David Crandall and Geoffery Fox.},
    journal = {International Conference on Image Processing (ICIP)},
    year = {2014}}

Learning to Identify Local Flora with Human Feedback

Stefan Lee and David Crandall

Workshop on Computer Vision and Human Computation, CVPR 2014

@article{lee_flora_2014, 
    title = {Learning to Identify Local Flora with Human Feedback},
    author = {Stefan Lee and David Crandall},
    journal = {Workshop on Computer Vision and Human Computation, IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2014}}

Other Work
David J. Crandall, Yunpeng Li, Stefan Lee, and Daniel P. Huttenlocher. “Recognizing Landmarks in Large-Scale Social Image Collections". Large-Scale Visual Geo-Localization. Ed. Amir R. Zamir, Asaad Hakeem, Luc Van Gool, Mubarak Shah, Richard Szeliski. Springer, 2016.
Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David J. Crandall, and Dhruv Batra. Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks. arXiv:1511.06314, 2015. [PDF]
Recent Talks
Watch online:

Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

Conference on Robotic Learning (CoRL) 2018

Training Embodied Agents in Semantically and Perceptually Rich Simulations

DARPA 60th Anniversary (D60) - DARPA Riser 2018


Slides:
- Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
- Training Embodied Agents in Semantically and Perceptually Rich Simulations
- Towards Goal-Driven, Visually Grounded Dialog Agents
- Training Diverse Ensembles of Deep Networks