Selected Projects

Cross-Task Learning and Object Discovery: As part of the National Robotics Initiative project, we are developing methods for automatically discovering object categories in unlabeled data, using cross-task learning and a novel deep learning-based clustering loss.
  • Y.C. Hsu and Z. Kira, , "Neural network-based clustering using pairwise constraints", in International Conference on Learning Representations Workshop Track (ICLR), 2016. [pdf] [arxiv (extended paper)] [project] [code]
  • Y.C. Hsu, Z. Lv, and Z. Kira, , “Deep Image Category Discovery using a Transferred Similarity Function,” arXiv:1612.01253 [cs], Dec. 2016. [pdf]


Scene Flow: As part of the National Robotics Initiative project, we are developing methods for estimating scene flow (dense 3D motion field) using factor graphs (using continuous optimization) and more recently deep learning-based methods. Joint work with Frank Dellaert's group (Zhaoyang Lv, Chris Beall) as well as other collaborators.
  • Z. Lv, C. Beall, P.F. Alcantarilla, F. Li, Z. Kira, and F. Dellaert, "A Continuous Optimization Approach for Efficient and Accurate Scene Flow", in proceedings of the European Conference on Computer Vision (ECCV), 2016.

Project Page


Multi-modal/Multi-cue fusion: We have been developing methods to combine multiple modalities (e.g. LIDAR and images) including through cueing and fusion at multiple levels. Our latest paper showed inmproved performance with mid-level fusion as well as stable training, all with only a small increase in the number of parameters for RGB-only networks.
  • Schlosser, J., Chow, C., and Kira, Z., "Fusing LIDAR and Images for Pedestrian Detection using Convolutional Neural Networks ", in proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2016.
  • Kira, Z., Hadsell, R., Salgian, G., and Samarasekera, S., "Long-Range Pedestrian Detection using Stereo and a Cascade of Convolutional Network Classifiers", in proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012. [pdf]
  • Kira, Z., Southall, B., Kuthirummal, S., and Eledath, J., "Multi-Sensor Fusion for Pedestrian Detection on the Move ", in proceedings of the IEEE International Conference on Technologies for Practical Robot Applications (poster), 2012.

Fine-grained Video Analysis: We are investigating recurrent and convolutional neural networks to better exploit spatio-temporal data in videos. In the arxiv paper we show tha LSTMs can achieve state of art results (with some work) although CNNs can achieve good results as well. We are expanding on this to model what is happening in videos in a more fine-grained manner, as described in the latest accepted NIPS workshop paper and arxiv paper in submission. Joint work with Prof. AlRegib's lab (Chih-Yao Ma and Min-Hung Chen) and NEC Labs.
  • C.Y. Ma, A. Kadav, I. Melvin, Z. Kira, G. AlRegib, and H. Peter Graf, Attend and Interact: Higher-Order Object Interactions for Video Understanding", in submission. [arxiv]
  • C.Y. Ma, A. Kadav, I. Melvin, Z. Kira, G. AlRegib, and H. Peter Graf, Grounded Objects and Interactions for Video Captioning", accepted to the NIPS 2017 Workshop on Visually-Grounded Interaction and Language (ViGIL), 2017. [arxiv]
  • C.Y., Ma, M.H. Chen, Z., Kira, and G. AlRegib, "TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition", in submission, [arxiv] [code]

Game theory for implicit generative models: We are working on viewing various machine learning problems through the lens of game theory, starting with implicit generative modeling (e.g. GANs). We have shown interesting connections to online learning and inspired by them developed a new regularization method (DRAGAN) that makes training GANs more stable across divergences and architecures. Joint work with Naveen Kodali and Prof. James Hays and Jake Abernethy.
  • N., Kodali, J., Abernethy, J., Hays, and Z., Kira, "How to Train Your DRAGAN", in submission, [arxiv] [code]

Knowledge transfer across heterogeneous robots: In my thesis I showed that mid-level representations are useful for learning object models (in the form of Gaussian Mixture Models) and transferring them across heterogeneous robots with differing sensors. Of course, feature learning has come a long way since then (starting wih sparse coding) and we extended these mehods. These principles have been shown relevant in the age of deep learning, where a hierarchy of features have been shown to be extremely transferrable, albeit with some fine-tuning on labeled examples. Our clustering work for cross-task learning (see above) extends this line of work when there is no labeled data, and will be applied to heterogeneous robot teams in future work.
  • Kira, Z., "Inter-Robot Transfer Learning for Perceptual Classification", in proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2010. [pdf]
  • Kira, Z., Communication and Alignment of Grounded Symbolic Knowledge Among Heterogeneous Robots, Ph.D. Dissertation, College of Computing, Georgia Institute of Technology, May 2010. [pdf]