This project seeks to develop techniques to advance the state of the art in tackling the challenges associated with creating semantic maps representations using robots, addressing issues related to the scalability and semantic interpretability of such maps. The activities include advancement of knowledge in multiple fields, such as computer vision, structure from motion, robotics, and semantic mapping. The results have the potential for many societal applications including city planning, asset management, creation of historical records, and support for autonomous driving. The demonstration of the developed theoretical techniques for real-time interaction between humans and robots facilitated by a semantic map enables even greater societal benefit, for example for emergency management, crime prevention, and traffic management. Direct educational impact is anticipated for graduate students and the results are disseminated through both publications and software, allowing the community to leverage the results.

This research program advances real-time large-scale distributed semantic mapping of outdoor environments. Specifically, the research team is enabling real-time large-scale semantic mapping by using unsupervised object discovery, obviating the need for large sets of annotated videos for each object category which becomes prohibitive when dealing with hundreds of object categories. The research team frames this process within the structure from motion optimization framework, thereby leveraging geometric and multi-view constraints and features to increase reliability of object track association as well as category clustering. In addition to address scalability, the project develops a distributed, multi-robot system, allowing large teams of air and ground vehicles to cooperatively build a map of large geographic areas in reasonable time frames. Furthermore, the project develops techniques to make the maps more semantically-meaningful and hence interpretable by humans. To accomplish this objective, the research team uses automatic techniques to attach semantic labels to objects discovered in an unsupervised manner. Moreover, humans can interact with the system at multiple levels. Human users can refine both the object categories and semantic labels to increase their accuracy, as well as designate dynamic targets of interest and task robots to track them.

News


ECCV 2018 Conference Paper: Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation

CVPR 2018 DeepVision Workshop Paper:A probabilistic constrained clustering for transfer learning and image category discovery

IJCNN 2018 Conference Paper: Learning to Cluster for Proposal-Free Instance Segmentation

CVPR 2018 Conference Paper: Attend and Interact: Higher-Order Object Interactions for Video Understanding

ICLR 2018 Conference Paper: Learning to Cluster in order to Transfer Across Domains and Tasks

NIPS 2017 ViGIL Workshop: Grounded Objects and Interactions for Video Captioning

ECCV 2016: A Continuous Optimization Approach for Efficient and Accurate Scene Flow

We propose a continuous optimization method for solving dense 3D scene flow problems from stereo imagery. As in recent work, we represent the dynamic 3D scene as a collection of rigidly moving planar segments. The scene flow problem then becomes the joint estimation of pixel-to-segment assignment, 3D position, normal vector and rigid motion parameters for each segment, leading to a complex and expensive discrete-continuous optimization problem. In contrast, we propose a purely continuous formulation which can be solved more efficiently. Using a fine superpixel segmentation that is fixed a-priori, we propose a factor graph formulation that decomposes the problem into photometric, geometric, and smoothing constraints. We initialize the solution with a novel, high-quality initialization method, then independently refine the geometry and motion of the scene, and finally perform a global nonlinear refinement using Levenberg-Marquardt. We evaluate our method in the challenging KITTI Scene Flow benchmark, ranking in third position, while being 3 to 30 times faster than the top competitors (x37 and x3.75).

ICLR 2016 Workshop Track: Neural-network Based Clustering using Pairwise Constraints

We present a neural network-based end-to-end clustering framework, using a novel strategy to utilize the contrastive criteria for pushing data-forming clusters directly from raw data, in addition to learning a feature embedding suitable for such clustering. The network is trained with partial pairwise relationships between instances. The experiments show that the approach beats the conventional two-stage method (feature embeddings clustered via traditional clustering such as k-means) by a significant margin, especially when the number of clusters is unknown.

Learn More