Zsolt Kira - Research

Beyond Supervised Learning: We are interested in a range of machine learning problems that move beyond needing large amounts of explicit human annotation. This includes continual learning, cross-task learning (i.e. unlabeled datasets with entirely new categories), semi-supervised learning, one/few-shot learning, and domain adaptation.

Semi-supervised Learning:

C.W. Kuo, C.Y. Ma, J.B. Huang, and Z. Kira"Manifold Graph with Learned Prototypes for Semi-Supervised Image Classification", arXiv 1906.05202, 2019. [arxiv] [github (coming soon)] [project]

Few-shot Learning:

W. Chen, Y.C. Liu, Z. Kira, Y.C. Wang, J. Huang, "A Closer Look at Few-shot Classification", accepted to the International Conference on Learning Representations (ICLR), 2019. [pdf]

Cross-task learning

Y.C. Hsu, Z. Lv, J. Schlosser, P. Odom, and Z. Kira, "Multi-class classification without multi-class labels ", accepted to the International Conference on Learning Representations (ICLR), 2019. [pdf]
Y.C. Hsu, Z. Lv, Z. Kira, Learning to Cluster in order to Transfer Across Domains and Tasks", accepted to the International Conference on Learning Representations (ICLR), 2018. [arxiv]

Domain Adaptation:

M.H. Chen, B. Li, Y. Bao, G. AlRegib, and Z. Kira"Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation"
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [arxiv]] [github
M.H. Chen, Z. Kira, G. AlRegib, J. Yoo, R. Chen, and J. Zheng "Temporal Attentive Alignment for Large-Scale Video Domain Adaptation"International Conference in Computer Vision (ICCV), 2019. [paper and code coming soon!]
Y.C. Hsu, Z. Lv, Z. Kira, Learning to Cluster in order to Transfer Across Domains and Tasks", accepted to the International Conference on Learning Representations (ICLR), 2018. [arxiv]

Continual Learning:

Y.C. Hsu, Y.C. Liu , and Z. Kira, "Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines", accepted to the NIPS Workshop on Continual Learning, 2018. [arxiv]

Distributed Perception: We are interested in the integration of information from multiple sensor modalities and robots in a principled way, including when a heterogeneous set of sensors are on different platforms.

Y.C. Liu, J. Tian, N. Glaser, and Z. Kira"When2com: Multi-agent perception via Communication Graph Grouping" IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020. [PDF coming soon!]
Y.C. Liu, J. Tian, C.Y. Ma, N. Glaser, C.W. Kuo, and Z. Kira "Who2com: Collaborative Perception via Learnable Handshake communication"IEEE International Conference on Robotics and Automation (ICRA), 2020. [arxiv]
J. Tian, W. Cheung, N. Glaser, Y.C. Liu, and Z. Kira "UNO: Uncertainty-aware Noisy-Or Multimodal Fusion for Unanticipated Input Degradation" IEEE International Conference on Robotics and Automation (ICRA), 2020.
Also in IROS Workshop on the Importance of Uncertainty in Deep Learning for Robotics, 2019. [arxiv]

Cross-Task Learning, clustering, and Object Discovery: As part of the National Robotics Initiative project, we are developing methods for automatically discovering object categories in unlabeled data, using cross-task learning and a novel deep learning-based clustering loss.

Y.C. Hsu, Z. Lv, Z. Kira, Learning to Cluster in order to Transfer Across Domains and Tasks", accepted to the International Conference on Learning Representations (ICLR), 2018. [arxiv]
Y.C. Hsu, Z. Lv, J. Schlosser, P. Odom, Z. Kira, A probabilistic constrained clustering for transfer learning and image category discovery", in the 2018 CVPR Deep-Vision Workshop, 2018. [arxiv] [code]
Y.C. Hsu, Z. Xu, Z. Kira, and J. Huang, Learning to Cluster for Proposal-Free Instance Segmentation", accepted to the International Conference on Neural Networks (IJCNN), 2018. [arxiv]
Y.C. Hsu and Z. Kira, , "Neural network-based clustering using pairwise constraints", in International Conference on Learning Representations Workshop Track (ICLR), 2016. [pdf] [arxiv (extended paper)] [project] [code]
Y.C. Hsu, Z. Lv, and Z. Kira, , “Deep Image Category Discovery using a Transferred Similarity Function,” arXiv:1612.01253 [cs], Dec. 2016. [pdf]

Goal-Driven Perception: Incorporating decision-making, and specifically top-down cues and goal reasoning, to dynamically change how perceptual inputs are processed.

C.Y. Ma, Z. Wu, G. AlRegib, C. Xiong, and Z. Kira"The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation"IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. [arxiv] [github] [project]
C.Y. Ma, J. Lu, Z. Wu, G. AlRegib, Z. Kira, R. Socher, and C. Xiong, "Self-Monitoring Navigation Agent via Auxiliary Progress Estimation", accepted to the International Conference on Learning Representations (ICLR), 2019. [pdf] [github]

Scene Flow: As part of the National Robotics Initiative project, we are developing methods for estimating scene flow (dense 3D motion field) using factor graphs (using continuous optimization) and more recently deep learning-based methods. Joint work with Frank Dellaert's group (Zhaoyang Lv, Chris Beall) as well as other collaborators.

Z. Lv, C. Beall, P.F. Alcantarilla, F. Li, Z. Kira, and F. Dellaert, "A Continuous Optimization Approach for Efficient and Accurate Scene Flow", in proceedings of the European Conference on Computer Vision (ECCV), 2016.

Project Page

Multi-modal/Multi-cue fusion: We have been developing methods to combine multiple modalities (e.g. LIDAR and images) including through cueing and fusion at multiple levels. Our latest paper showed inmproved performance with mid-level fusion as well as stable training, all with only a small increase in the number of parameters for RGB-only networks.

Schlosser, J., Chow, C., and Kira, Z., "Fusing LIDAR and Images for Pedestrian Detection using Convolutional Neural Networks ", in proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016.
Kira, Z., Hadsell, R., Salgian, G., and Samarasekera, S., "Long-Range Pedestrian Detection using Stereo and a Cascade of Convolutional Network Classifiers", in proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012. [pdf]
Kira, Z., Southall, B., Kuthirummal, S., and Eledath, J., "Multi-Sensor Fusion for Pedestrian Detection on the Move ", in proceedings of the IEEE International Conference on Technologies for Practical Robot Applications (poster), 2012.

Fine-grained Video Analysis: We are investigating recurrent and convolutional neural networks to better exploit spatio-temporal data in videos. In the arxiv paper we show tha LSTMs can achieve state of art results (with some work) although CNNs can achieve good results as well. We are expanding on this to model what is happening in videos in a more fine-grained manner, as described in the latest accepted NIPS workshop paper and arxiv paper in submission. Joint work with Prof. AlRegib's lab (Chih-Yao Ma and Min-Hung Chen) and NEC Labs.

C.Y. Ma, A. Kadav, I. Melvin, Z. Kira, G. AlRegib, and H. Peter Graf, Attend and Interact: Higher-Order Object Interactions for Video Understanding", in submission. [arxiv]
C.Y. Ma, A. Kadav, I. Melvin, Z. Kira, G. AlRegib, and H. Peter Graf, Grounded Objects and Interactions for Video Captioning", accepted to the NIPS 2017 Workshop on Visually-Grounded Interaction and Language (ViGIL), 2017. [arxiv]
C.Y., Ma, M.H. Chen, Z., Kira, and G. AlRegib, "TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition", in submission, [arxiv] [code]

Game theory for implicit generative models: We are working on viewing various machine learning problems through the lens of game theory, starting with implicit generative modeling (e.g. GANs). We have shown interesting connections to online learning and inspired by them developed a new regularization method (DRAGAN) that makes training GANs more stable across divergences and architecures. Joint work with Naveen Kodali and Prof. James Hays and Jake Abernethy.

N., Kodali, J., Abernethy, J., Hays, and Z., Kira, "How to Train Your DRAGAN", in submission, [arxiv] [code]

Knowledge transfer across heterogeneous robots: In my thesis I showed that mid-level representations are useful for learning object models (in the form of Gaussian Mixture Models) and transferring them across heterogeneous robots with differing sensors. Of course, feature learning has come a long way since then (starting wih sparse coding) and we extended these mehods. These principles have been shown relevant in the age of deep learning, where a hierarchy of features have been shown to be extremely transferrable, albeit with some fine-tuning on labeled examples. Our clustering work for cross-task learning (see above) extends this line of work when there is no labeled data, and will be applied to heterogeneous robot teams in future work.

Kira, Z., "Inter-Robot Transfer Learning for Perceptual Classification", in proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2010. [pdf]
Kira, Z., Communication and Alignment of Grounded Symbolic Knowledge Among Heterogeneous Robots, Ph.D. Dissertation, College of Computing, Georgia Institute of Technology, May 2010. [pdf]

Contact:
zkira at gatech dot edu

Selected Projects

Past Projects