Kihwan Kim

    Staff Research Scientist at NVIDIA Research

    Ph.D. in Computer Science

    Georgia Institue of Technology , CoC /GVU

    Computational Perception Laboratory (CPL)

   Contact :

    email)  kihwan23 at   

   (nvresearch) kihwank at nvidia dot com    

    2788 San Tomas Expressway   

    Santa Clara, CA 95050

My field of research is Computer Vision and Machine Learning more specifically for 3D Vision and scene perception problems in any intelligent (AI) system including autonomous driving, AR/VR and smart surveillance systems. I am currently at NVIDIA Research , my NVResearch page is [HERE] , however, I will keep maintaining this main page as well. Here is my [Curriculum Vitae](CV) (updated Jan. 2019) and [Google Scholar].

News, recently released code, talks and dataset

[PlaneRCNN] (GitHub) : Plane detection and reconstruction from single RGB image, CVPR 2019 (Oral).
[Neural RGB->D Sensor] (GitHub) : Per-pixel depth estimation from a RGB video, CVPR 2019 (Oral).
*CVPR 2019 Best paper finalist.
[Competitive Collaboration] (GitHub) : Joint unsupervised learning of depth, motion and flow, CVPR 2019.
[3D Human affordance (TBD)]: Putting human in a scene: Human affordance for 3D scene reasoning, CVPR 2019.
[Intrinsic3D] (GitHub): Finally released our ICCV 2017 paper on 3D Reconstruction with a joint optimization from apperarance, geometry and lighting.
[3D Vision and beyond] (slide) : My Stanford SCIEN talk about state-of-the-art 3D Computer vision techniques.
2017 -- 2018
[HGMM and HGMR](TBD) (will be released with ISAAC SDK ): Point cloud processing and registration, CVPR 2016 (Spot Oral), ECCV 2018.
[Learning rigidity] (GitHub): Learning rigidity for 3D Scene flow estimation, ECCV 2018.
[3D Scene flow and rigidity] (slide) My GTC talk about scene flow and learning rigidity. ECCV 2018.
[GeoMapNet] (Github) Learning-based 6DOF camera pose estimation, CVPR 2018 (Spot' Oral).
[LearningBRDF] (NVR): Dataset for learning based reflectance estimation, ICCV 2017 (Oral).
[Dynamic Hand Gesture] (NVR) Dataset for online gesture recognition with R3DCNN, CVPR 2016.
[DTSLAM] (GitHub) : SLAM, Camera pose estimatino and mapping, 3DV 2015.
*See more details about old projects and their code below.

Research projects

[CVPR19] Oral

 Plane detection, segmentation and 3D reconstruction

PlaneRCNN: 3D Plane Detection and Reconstruction from a Single Image
In CVPR 2019 [PDF] [Video] [Project page] [Code]

This paper proposes a deep neural architecture, PlaneR-CNN, that detects and reconstructs piecewise planar surfaces from a single RGB image. PlaneRCNN employs a variant of Mask R-CNN to detect planes with their plane parameters and segmentation masks. PlaneRCNN then jointly refines all the segmentation masks with a novel loss enforcing the consistency with a nearby view during training.

with Chen Liu, Jinwei Gu, Yasutaka Furukawa, and Jan Kautz

[CVPR19] Oral *Best paper finalist.

  Neural RGB-D Sensing: Depth estimation from a video

Neural RGB-D Sensing: Depth estimation from a video
In CVPR 2019 [PDF] [Video] [Project page] [Code]

In this paper, we propose a deep learning (DL) method to estimate per-pixel depth and its uncertainty continuously from a monocular video stream, with the goal of effectively turning an RGB camera into an RGB-D camera. Unlike prior DL-basedmethods, we estimate a depth probability distribution for each pixel rather than a single depth value, leading to an estimate of a 3D depth probability volume for each input frame.

with Chao Liu, Jinwei Gu, Srinivasa Narasimhan, and Jan Kautz


  Putting Human in a Scene: 3D Human Affordance

Putting Humans in a Scene: Learning Affordance in 3D Indoor Environments
In CVPR 2019 [PDF] [Video] [Project page] [Code (TBD)]

In this paper, we aim to predict affordances of 3D indoor scenes, specifically what human poses are afforded by a given indoor environment, such as sitting on a chair or standing on the floor. We build a fully automatic 3D pose synthesizer that fuses semanticknowledge from a large number of 2D poses extracted from TV shows as well as 3D geometric knowledge from voxel representations of indoor scenes.

with Xueting Li, Sifei Liu, Xiaolong Wang, Ming-Hsuan Yang , and Jan Kautz


  Unsupervised Joint Learning of Depth, Pose, Flow and Motion

Competitive Collaboration: Joint Unsupervised Learning of Depth, CameraMotion, Optical Flow and Motion Segmentation
In CVPR 2019 [PDF] [Project page] [Code]

Single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions are challenging but coupled problems. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Thus, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems.

with Anurag Ranjan, Varun Jampani, Deqing Sun, Jonas Wulffe, and Michael Black


  Learning Rigidity for 3D Scene Flow Estimation

Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation
In ECCV 2018 [PDF] [Talk slide] [Video] [Project page] [Code]

In a dynamic scene, the main challenge is the disambiguation of the camera motion from scene motion, which becomes more difficult as the amount of rigidity observed decreases. In this paper we propose to learn the rigidity of a scene in a supervised manner from a large collection of dynamic scene data, and directly infer a rigidity mask from two sequential images with depths.

with Zhaoyang Lv, Alejandro Troccoli, Deqing Sun , James M. Rehg , and Jan Kautz


  Hierarchical GMM for 3D Point Cloud Registration

HGMR: Hierarchical Gaussian Mixtures for Adaptive 3D Registration
In ECCV 2018 [PDF] [Video] [Project page]

We present a new registration algorithm that is able to achieve state-of-the-art speed and accuracy through its use of an adaptive hierarchical Gaussian Mixture Model (GMM) representation. Our method constructs a top-down multi-scale representation of point cloud data by recursively running many small-scale data likelihood segmentations in parallel on a GPU. It performs a pointwise data association in logarithmic-time while dynamically adjusting the level of detail to best match the complexity and spatial distribution of geometry.

with Ben Eckart, and Jan Kautz

[CVPR18] Spotlight Oral

  Learning-based Camera Localization (MapNet)

Geometry-Aware Learning of Maps for Camera Localization (MapNet)
In CVPR 2018 [PDF] [Video] [Project page] [Code]

We propose to represent a map as a deep neural net called MapNet, which enables learning a data-driven map representation. Geometric constraints expressed by these inputs, which have traditionally been used in bundle adjustment or pose-graph optimization, are formulated as loss terms in MapNet training and also used during inference.

with Samarth Brahmbhatt, Jinwei Gu, James Hays, and Jan Kautz

[ICCV17] Oral

  Deep Learning-based Reflectance Estimation On-the-fly

A Lightweight Approach for On-the-Fly Reflectance Estimation
In ICCV 2017 [PDF] [Talk slides] [Video] [Project page] [Dataset]

We propose to represent a map as a deep neural net called MapNet, which enables learning a data-driven map representation. Geometric constraints expressed by these inputs, which have traditionally been used in bundle adjustment or pose-graph optimization, are formulated as loss terms in MapNet training and also used during inference.

with Jinwei Gu, Stephen Tyree, Pavlo Molchanov, Matthias Niessner , and Jan Kautz


  Joint Optimization of Geometry, Color and Lighting for 3D Reconstruction

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting
In ICCV 2017 [PDF] [Talk slides] [Project page] [Dataset] [Code]

We introduce a novel method to obtain high-quality 3D reconstructions from consumer RGB-D sensors. We simultaneously optimize a geometry encoded in a signed distance field, textures from automatically selected keyframes, and their camera poses along with material and scene lighting estimated from spatially-varying spherical harmonics (SVSH) from subvolumes of the reconstructed scene.

with Robert Maier, Daniel Cremers , Jan Kautz, and Matthias Niessner

[3DV17] Oral

  Multi-frame 3D Scene Flow Estimation

Multiframe Scene Flow with Piecewise Rigid Motion
In IEEE 3DV 2017 [PDF] [Talk slides] [Video] [Project page]

We introduce a novel multiframe scene flow approach that jointly optimizes the consistency of the patch appearances and their local rigid motions from RGB-D image sequences. We formulate scene flow recovery as a global non-linear least squares problem which is iteratively solved by a damped Gauss-Newton approach. As a result, we obtain a qualitatively new level of accuracy in RGB-D based scene flow estimation which can potentially run in real-time.

with Vladislav Golyanik, Robert Maier, Matthias Niessner , Jan Kautz

[CVPR16] Oral

  Accelerated Generative model (GMM) for 3D Vision

Accelerated Generative Models for 3D Point Cloud Data
In IEEE CVPR 2016 [PDF] [Talk slides] [Video] [Project page]

In this paper we introduce a method for constructing compact generative representations of point cloud at multiple levels of detail using hierarchical Gaussian Mixture Model (hGMM). As opposed to deterministic structures such as voxel grids or octrees, we propose probabilistic subdivisions of the data through local mixture modeling, and show how these subdivisions can provide a maximum likelihood segmentation of the data.

with Benjamin Eckart, Alejandro Troccoli , Alonzo Kelly, Jan Kautz


  Online classification of Dynamic Hand Gestures with R3DCNN

Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks
In IEEE CVPR 2016 [PDF] [Video] [Project page]

Automatic detection and classification of dynamic hand gestures is challenging as: 1) there is a large diversity in how people perform gestures, making detection and classification difficult; 2) the system must work online in order to avoid noticeable lags between performing a gesture and its classification. We address these challenges with a recurrent three-dimensional convolutional neural network that outperforms state-of-the-arts.

with Pavlo Molchanov, Shalini Gupta , and Xiaodong Yang , Stephen Tyree, and Jan Kautz


  VirtualEye: Real-time 3D Reconstruction for Fast Free View Video

NVIDIA VirtualEye: Real-time Fast Free View Video
In DARPA Wait What: A Future Technology Forum 2015, [Media] [Video]

We introduce a live, real-time full HD visualization of a scenes with both dynamic non-rigid objects and rigid static background structure with commodity depth and stereo cameras. This demo was introduced in DARPA's Wait What A Future Technology Conference 2015. The project aims real-time (+30fps) visualization of free view video streams from multiple cameras. The pipeline (preprocessing, capturing, fusion and meshfication) is completely benefit from NVIDIA's CUDA.

with Alejandro Troccoli , Xiaodong Yang , Natesh Srinivasan , Jan Kautz

[3DV15] Oral

  Fast and accurate PCD registration with GMM

MLMD: Maximum Likelihood Mixture Decoupling for Fast and Accurate Point Cloud Registration
In IEEE 3D Vision (3DV 2015) [PDF] [Video]

We introduce a PCD registration algorithm that utilizes Gaussian Mixture Models (GMM) and a novel dual-mode parameter optimization technique which we call mixture decoupling. We show how this decoupling technique facilitates both faster and more robust registration by first optimizing over the mixture parameters (decoupling the mixture weights, means, and covariances from the points) before optimizing over the 6DOF registration parameters.

with Benjamin Eckart, Alejandro Troccoli , Alonzo Kelly, Jan Kautz

[EGSR15] Oral

  Physically-based Rendering for Mixed and Augmented Reality

Filtering Environment Illumination for Interactive Physically-Based Rendering in Mixed Reality
In Eurographics Symposium on Rendering (EGSR) 2015 [PDF] [Supp] [Video]

We propose a photo-realistic augmented and mixed reality system that runs in interactive rates. Our primary contribution is an axis-aligned filtering scheme that preserves the frequency content of the illumination. We then demonstrate a novel two-mode path tracing approach that allows ray-tracing a scene with image-based real geometry (captured from commodity depth camera) and mesh-based virtual geometry.

with Soham Mehta, Dawid Pajak , Kari Pulli, Jan Kautz, and Ravi Ramamoorthi

[CVPRW15] Oral

  3D CNN for Dynamic Hand Gesture Recognition

Hand Gesture Recognition with 3D Convolutional Neural Networks
In IEEE CVPR 2015 Workshop on Hand gesture recognition
Winner of first HANDS challenage competition 2015. [PDF]

We propose an algorithm for drivers' hand gesture recognition from challenging depth and intensity data using 3D convolutional neural networks. Our solution combines information from multiple spatial scales for the final prediction. It also employs spatio-temporal data augmentation for more effective training and to reduce potential overfitting. Our method achieves a correct classification rate of 77.5% on the VIVA challenge dataset .

with Pavlo Molchanov, Shalini Gupta , and Jan Kautz

[FG15] [RIDARCon15]

  Multi-sensor Deep Learning architecture for Gesture recognition

Multi-sensor System for Driver's Hand-Gesture Recognition
In IEEE Automatic Face and Gesture Recognition (FG 2015) [PDF]

Short-Range FMCW Monopulse Radar for Hand-Gesture Sensing
In IEEE International Radar conference 2015 [PDF]

We propose a novel multi-sensor system for accurate and power-efficient dynamic car-driver hand-gesture recognition, using a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions.

with Pavlo Molchanov, Shalini Gupta , Kari Pulli

[3DV14] Oral

  DT-SLAM: Robest SLAM with Adaptive Triangulation for Rotation

DT-SLAM: Deferred Triangulation for Robust SLAM
IEEE 3D Vision Conference (3DV 2014) [PDF] [Video] [Code]

We introduce a real-time visual SLAM system that incrementally tracks individual 2D features, and estimates camera pose by using matched 2D features, regardless of the length of the baseline. Triangulating 2D features into 3D points is deferred until keyframes with sufficient baseline for the features are available. Our method can also deal with pure rotational motions, and fuse the two types of measurements in a bundle adjustment step.

with Daniel C. Herrera, and Kari Pulli


  WYSIWYG Viewfinder: Real-time Segmentation and Editing

WYSIWYG Computational Photography via Viewfinder Editing
ACM Transaction on Graphics, SIGGRAPH Asia 2013 [PDF] [Video] [Project page]

We introduce a WYSIWYG viewfinder editing, which makes the viewfinder more accurately reflect the final image the user intends to create. We allow the user to alter the local or global appearance (tone, color, or focus) via stroke-based input, and propagate the edits spatiotemporally. The system then delivers a real-time visualization of these modifications to the user, and drives the camera control routines to select better capture parameters.

with Jongmin Baek, Dawid Pajak , Kari Pulli, and Marc Levoy


  Prediction of Camera Motions with Gaussian Process Regression

Detecting Regions of Interest in Dynamic Scenes with Camera Motions
In IEEE CVPR 2012 [PDF] [Video] [Project page]

We use stochastic fields for predicting important future regions of interest as the scene evolves dynamically. We evaluate our approach on a variety of videos of team sports. We show that our approach can detect where to move the camera based on observations in the scene and compare the detected/predicted regions of interest to the camera motion as generated by actual camera operators

with Dongreyol Lee and Irfan Essa


  Gaussian Process Regression Flow (GPRF)

Gaussian Process Regression Flow for Analysis of Motion Trajectories
In IEEE ICCV 2011 [PDF] [Video] [Project page]

In this paper, we introduce a new representation specifically aimed at matching motion trajectories. We model a trajectory as a continuous dense flow field from a sparse set of vector sequences using Gaussian Process Regression. Our approach works well on various types of complete and incomplete trajectories from a variety of video data sets with different frame rates.

with Dongreyol Lee and Irfan Essa


  Global Motion Prediction for Automated Broadcasting System

Motion Fields to Predict Play Evolution in Dynamic Sports Scenes
In IEEE CVPR 2010 [PDF] [Video] [Project page]

Player actions and interactions in dynamic sports scenes are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that such constrained multi-agent events can be analyzed, and even predicted, by estimating the global movements of all players in the scene at any time and used to predict play evolution.

with Matthias Grundmann, Ariel Shamir, Iain Matthews, Jessica Hodgins and Irfan Essa


  Player Tracking and Localization with Multiple Cameras

Player Localization using Multple Static Cameras for Sports
In IEEE CVPR 2010 [PDF] [Video] [Project page]

Modeling and analysis for the problem of fusing corresponding players' positional information as finding minimum weight K-length cycles in complete K- partite graphs. We use our proposed algorithm-class for an end-to-end sports visualization framework, and demonstrate its robustness by presenting results over 60,000 frames of real soccer footage captured over five different illumination conditions, play types, and team attire.

with Raffay Hamid, Ram Krishan Kumar, Matthias Grundmann, Jessica Hodgins and Irfan Essa

[ISMAR09] Oral

  Augmenting Earth-Maps with Dynamic Information

Augmenting Aerial Earth Maps with Dynamic Information
In ISMAR 2009, Journal of Virtual Reality 2011 [PDF] [Video] [Slide] [Project page]

Modeling and analysis for the problem of fusing corresponding players' positional information as finding minimum weight K-length cycles in complete K- partite graphs. We use our proposed algorithm-class for an end-to-end sports visualization framework, and demonstrate its robustness by presenting results over 60,000 frames of real soccer footage captured over five different illumination conditions, play types, and team attire.

with Irfan Essa , Sangmin Oh and Jeonggyu Lee [Media] : CNN, New Scientist , Popular Science , Discovery Channel ,    MIT Tech Review , Engadget, Vizworld, Revolution Magazine , etc.


  Real-time Transparent-Colored Shadow Volume

A Shadow Volume Algorithm for Opaque and Transparent Non-Manifold Casters
Journal of Graphics Tools 2008 [PDF] [Video]

We provide a novel shadow volume algorithms that extends to general non-manifold meshes and an additional extension to shadows of transparent casters. To achieve these, we first introduce a generalization of an objectí»s silhouette to non-manifold meshes. we then compute the light intensity arrived at the receiver fragments after the light has traveled through multiple colored transparent receiver surfaces

with Byungmoon Kim, Greg Turk

[ISWC08] Oral

  GPSRay: 3D Reconstruction of Urban Scenes using GPS

Localization and 3D Reconstruction of Urban Scenes Using GPS
IEEE ISWC 2008 [PDF] [Video] [Slide]

Using off-the-shelf Global Positioning System (GPS) units, we reconstruct buildings in 3D by exploiting the reduction in signal to noise ratio (SNR) that occurs when the buildings obstruct the line-of-sight between the moving units and the orbiting satellites. We measure the size and height of skyscrapers as well as automatically constructing a density map representing the location of multiple buildings in an urban landscape

with Jay summet, Thad Starner, Mrunal Kapade, Daniel Ashbrook , and Irfan Essa


  Video based Non-Photorealistic Rendering

Video based Non-Photorealistic Rendering
Samsung STAR/SAIT 2008 [PDF] [Video1] [Video2]

Making Non-photorealistic Rendering(NPR) system using global gradient field from   Radiail Basis interpolation and dispersion filters (water-colorization). For temporal   coherence we adopt Michael Black's piecewise-smooth flow fields ( robust   regularization). Dispersion filter is also designed for mimicing pigment dispersion   on the water fluid.

with Irfan Essa

[ACMMM 06]

  Video based Non-Photorealistic Rendering

Interactive Mosaic Generation for Video Navigation
ACM Multimedia (ACMMM) 2006 [PDF] [Project page]

We introduce a novel mosaicing algorithm using multi-scale tiling algorithm. The method allows the users to create mosaic from a collection of videos and navigate and edit the video scenes. In matching process we used the annotated information from Family Video

with Irfan Essa, Gregory Abowd


  Face Recognition using Generalized SVD

Face Recognition using Generalized Singular Value Decomposition
Tech report [Project page]

We propose a Face recognition algorithm using GSVD. We used Linear Discriminant Analysis with Generalized Singular Value Decomposition (GSVD) which effectively reduces dimension of input data images while keep the classification performance better.

with Sangmin Lee, James M. Rehg , Haesun Park


  Real-time Face Detection

Face Detection with Adaboost and Morphology Operators
Tech report [Project page]

We demonstrate the implementation of a two state-of-the-art face detection algorithms. We first demonstrate Viola's Adaboost-based approach, then show Han's Morphology-based approach and show how we fuse the both methods with various evaluatoins.

Research and Development at Samsung SDS IT R&D Center



  ViaFace: Face Identification System

Face Detection with Adaboost and Morphology Operators
Tech report Details

Samsung IT R&D Center released a face recognition system (ViaFace) in  2002  after 3 years of research . It was demonstrated in Las vegas Comdex 2001. Later, it was deployed in  various field and industry including well known Korean apartment  franchise  'Raemian' and A Mexico airport etc.   This system covers both verification and identification.


  Syncbiz: Real-time Collaboration System

Samsung Real-time Collaboration System
Tech report Details

Syncbiz is a real-time collaboration system, which includes application sharing module, text chatting , video/audio conferencing module, shared virtual directory module, multiuser white board module, and realtime agenda scheduler. A single syncbiz local server has a capacity to sustain 50 concurrent sessions.
This project is a winner of 2003 Samsung Best Solution Award.


  IP-STB Framework : LivingWise CS (LWCS)

LivingWiseCS (LWCS) Samsung's Smart City STB Framework
Tech report Details

LWCS is an framework for IP Set top box used for a Smart City projects by Samsung Electronics and KT (Korea Telecommunication). It manages overall I/O and controllers on top of Microsoft Windows CE environment.

with Taesoo Jun, Hanchoel Kim and Joonsung Park.