Kihwan Kim

   Senior Research Scientist at NVIDIA Research

   Ph.D. in Computer Science

   Georgia Institue of Technology, CoC/GVU/CPL  

    Advisor : Dr. Irfan Essa

   Member of  CPL and Graphics Group 

  Contact :

   email)  kihwan23 at   

   (nvresearch) kihwank at nvidia dot com    

    2701 San Tomas Expressway   

    Santa Clara, CA 95050

  I am currently at NVIDIA Research, my new page is here, however, I will keep maintaining this main page

Main research

 Curriculum Vitae [PDF] (updated Mar. 2018)

  Learning Rigidity for 3D Scene Flow

  A dynamic scene is commonly captured by a moving camera, increasing the task complexity because the scene is observed from different view points. The main challenge is the disambiguation of the camera motion from scene motion, which becomes more difficult as the amount of rigidity observed decreases. Compared to other state-of-the-art 3D scene flow estimation methods, in this paper we propose to learn the rigidity of a scene in a supervised manner from a large collection of dynamic scene data, and directly infer a rigidity mask from two sequential images with depths.

 - "Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation " PDF (arXiv)   ECCV 2018 Project page -with Zhaoyang Lv, Alejandro Troccoli , Deqing Sun , James M. Rehg , and Jan Kautz

  Hierarchical GMM for 3D Point Cloud Registration

  We present a new registration algorithm that is able to achieve state-of-the-art speed and accuracy through its use of a hierarchical Gaussian Mixture Model (GMM) representation. Our method constructs a top-down multi-scale representation of point cloud data by recursively running many small-scale data likelihood segmentations in parallel on a GPU. Compared to previous Iterative Closest Point and GMM-based techniques, our tree-based point association algorithm performs data association in logarithmic-time while dynamically adjusting the level of detail to best match the complexity and spatial distribution characteristics of geometry.

 - "Fast and Accurate Point Cloud Registration using Trees of Gaussian Mixtures " PDF (arXiv)   ECCV 2018 Project page -with Ben Eckart, and Jan Kautz

  Learning-based Camera Localization (MapNet)

  We propose to represent maps as a deep neural net called MapNet, which enables learning a data-driven map representation. Unlike prior work on learning maps, MapNet exploits cheap and ubiquitous sensory inputs like visual odometry and GPS in addition to images and fuses them together for camera localization. Geometric constraints expressed by these inputs, which have traditionally been used in bundle adjustment or pose-graph optimization, are formulated as loss terms in MapNet training and also used during inference.

 - "Geometry-Aware Learning of Maps for Camera Localization (MapNet) " PDF (arXiv)   Spotlight presentation in CVPR 2018 Project page , Code ,

  -with Samarth Brahmbhatt, Jinwei Gu, James Hays, and Jan Kautz

 Learning-based Reflectance Estimation On-the-fly

  We propose a lightweight approach for surface reflectance estimation directly from 8-bit RGB images in real-time, which can be easily plugged into any 3D scanning-and-fusion system with a commodity RGBD sensor. Our method is learning-based, and we propose two novel deep neural network architectures, HemiCNN and Grouplet, to deal with the unstructured input data from multiple viewpoints under unknown illumination.

 - "A Lightweight Approach for On-the-Fly Reflectance Estimation " Paper (PDF) , Supplementary document , and PDF (arXiv)   Oral presentation in IEEE ICCV 2017
Project page , Dataset , Talk slides

  -with Jinwei Gu, , Stephen Tyree, Pavlo Molchanov, Matthias Niessner , and Jan Kautz

 Intrinsic3D: 3D Reconstruction with SVSH

  We introduce a novel method to obtain high-quality 3D reconstructions from consumer RGB-D sensors. Our core idea is to simultaneously optimize for geometry encoded in a signed distance field, textures from automatically selected keyframes, and their camera poses along with material and scene lighting. To this end, we propose a joint surface reconstruction approach that is based on shape-from-shading (SfS) techniques and utilizes the estimation of spatially-varying spherical harmonics (SVSH) from subvolumes of the reconstructed scene.

 - "Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting " Paper (PDF) , Supplementary document,   Talk Slides    in IEEE ICCV 2017 Project page, Dataset

  -Collaborators: Robert Maier, Daniel Cremers , Jan Kautz, and Matthias Niessner

 Multi-frame 3D Scene Flow Estimation

  We introduce a novel multiframe scene flow approach that jointly optimizes the consistency of the patch appearances and their local rigid motions from RGB-D image sequences. We formulate scene flow recovery as a global non-linear least squares problem which is iteratively solved by a damped Gauss-Newton approach. As a result, we obtain a qualitatively new level of accuracy in RGB-D based scene flow estimation which can potentially run in real-time. Extensive experiments on synthetic and real data show that our method outperforms state-of-the-art.

 - "Multiframe Scene Flow with Piecewise Rigid Motion" in IEEE Conference on 3D Vision (3DV 2017)    Paper (PDF), Video , Slide, Project page

  -Collaborators: Vladislav Golyanik,Robert Maier, Matthias Niessner , Jan Kautz

 Accelerated Generative model (GMM) for 3D Vision

  In this paper we introduce a method for constructing compact generative representations of PCD at multiple levels of detail. As opposed to deterministic structures such as voxel grids or octrees, we propose probabilistic subdivisions of the data through local mixture modeling, and show how these subdivisions can provide a maximum likelihood segmentation of the data. We explore the trade-offs between model fidelity and model size at various levels of detail, our tests showing favorable performance when compared to octree and NDT-based methods.

 - "Accelerated Generative Models for 3D Point Cloud Data" in IEEE CVPR 2016
   Paper (PDF), Supplementary Document (PDF), Video , Spotlight Oral Slide, Project page

  -Collaborators: Benjamin Eckart, Alejandro Troccoli , Alonzo Kelly, Jan Kautz

 Online detection/classification of Dynamic Gestures

  Automatic detection and classification of dynamic hand gestures in real-world systems intended for human computer interaction is challenging as: 1) there is a large diversity in how people perform gestures, making detection and classification difficult; 2) the system must work online in order to avoid noticeable lags between performing a gesture and its classification. In this paper, we address these challenges with a recurrent three-dimensional convolutional neural network that performs simultaneous detection and classification of dynamic hand gestures from unsegmented multi-modal input streams.

 - "Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks " in IEEE CVPR 2016 Paper (PDF), Project page

 - "Towards Selecting Robust Hand Gestures for Automotive Interfaces " in IEEE Intelligent Vehicles Symposium (IV16) 2016 Paper (PDF), Project page

  -Pavlo Molchanov, Shalini Gupta , and Xiaodong Yang , Stephen Tyree, and Jan Kautz

 Real-time reconstruction for Fast Free View Video

  We introduce a live, real-time full HD visualization of a scenes with both dynamic non-rigid objects and rigid static background structure with commodity depth and stereo cameras. This demo was introduced in DARPA's Wait What A Future Technology Conference 2015. The project aims real-time (+30fps) visualization of free view video streams from multiple cameras. The pipeline (preprocessing, capturing, fusion and meshfication) is completely benefit from NVIDIA's CUDA.

 - "NVIDIA VirtualEye: Real-time Fast Free View Video " In DARPA Wait What: A Future Technology Forum 2015, St.Lois Media coverage

  -Collaborators: Alejandro Troccoli , Xiaodong Yang , Natesh Srinivasan , Jan Kautz

 Fast and accurate PCD registration

  We introduce a PCD registration algorithm that utilizes Gaussian Mixture Models (GMM) and a novel dual-mode parameter optimization technique which we call mixture decoupling. We show how this decoupling technique facilitates both faster and more robust registration by first optimizing over the mixture parameters (decoupling the mixture weights, means, and covariances from the points) before optimizing over the 6DOF registration parameters.

 - "MLMD: Maximum Likelihood Mixture Decoupling for Fast and Accurate Point Cloud Registration " In IEEE 3D Vision (3DV 2015) PDF

  -Collaborators: Benjamin Eckart, Alejandro Troccoli , Jan Kautz, and Alonzo Kelly

 Physically-based Rendering for Augmented Reality

  We propose a photo-realistic augmented and mixed reality system that runs in interactive rates. Our primary contribution is an axis-aligned filtering scheme that preserves the frequency content of the illumination. We then demonstrate a novel two-mode path tracing approach that allows ray-tracing a scene with image-based real geometry (captured from commodity depth camera) and mesh-based virtual geometry.

 - "Filtering Environment Illumination for Interactive Physically-Based Rendering in Mixed Reality " In Eurographics Symposium on Rendering (EGSR) 2015 PDF

 - Implementation details, derivations and proofs : PDF

  -Collaborators: Soham Mehta, Dawid Pajak , Kari Pulli, Jan Kautz, and Ravi Ramamoorthi

 3D CNN for Dynamic Hand Gesture Recognition

  We propose an algorithm for drivers' hand gesture recognition from challenging depth and intensity data using 3D convolutional neural networks. Our solution combines information from multiple spatial scales for the final prediction. It also employs spatio-temporal data augmentation for more effective training and to reduce potential overfitting.
Our method achieves a correct classification rate of 77.5% on the VIVA challenge dataset.

 - "Hand Gesture Recognition with 3D Convolutional Neural Networks " PDF

   IEEE CVPR 2015 Workshop on Hand gesture recognition

  -Collaborators: Pavlo Molchanov, Shalini Gupta , and Jan Kautz

 DNN-Gesture recognition with multi-modal sensors

  We propose a novel multi-sensor system for accurate and power-efficient dynamic car-driver hand-gesture recognition, using a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. We present a procedure to jointly calibrate the radar and depth sensors. We employ convolutional deep neural networks to fuse data from multiple sensors and to classify the gestures.

 - "Multi-sensor System for Driver's Hand-Gesture Recognition " PDF

   IEEE Automatic Face and Gesture Recognition (FG 2015) accepted as Oral

 - "Short-Range FMCW Monopulse Radar for Hand-Gesture Sensing " PDF

   IEEE International Radar conference 2015

  -Collaborators: Pavlo Molchanov, Shalini Gupta , and Kari Pulli

 DT-SLAM: SLAM with Deferred Triangulation

  We introduce a real-time visual SLAM system that incrementally tracks individual 2D features, and estimates camera pose by using matched 2D features, regardless of the length of the baseline. Triangulating 2D features into 3D points is deferred until keyframes with sufficient baseline for the features are available. Our method can also deal with pure rotational motions, and fuse the two types of measurements in a bundle adjustment step.

 - "DT-SLAM: Deferred Triangulation for Robust SLAM " PDF

   IEEE 3D Vision Conference (3DV 2014) at Tokyo Japan

  Source code (C++) of the system under BSD license is available : GITHUB

 -Collaborators : Daniel C. Herrera, and Kari Pulli

 WYSIWYG Computational Photography

  This paper explores the notion of viewfinder editing, which makes the viewfinder more accurately reflect the final image the user intends to create. We allow the user to alter the local or global appearance (tone, color, saturation, or focus) via stroke-based input, and propagate the edits spatiotemporally. The system then delivers a real-time visualization of these modifications to the user, and drives the camera control routines to select better capture parameters.

 - "WYSIWYG Computational Photography via Viewfinder Editing" PDF

  ACM Transactions on Graphics (SIGGRAPH Asia 2013) : PROJECT PAGE  

 -Collaborators :Jongmin Baek, Dawid Pajak , Kari Pulli, and Marc Levoy

 Prediction of ROI in Scenes with Camera Motions

 We use stochastic fields for predicting important future regions of interest as the scene evolves dynamically. We evaluate our approach on a variety of videos of team sports. We show that our approach can detect where to move the camera based on observations in the scene and compare the detected/predicted regions of interest to the camera motion as generated by actual camera operators

 - "Detecting Regions of Interest in Dynamic Scenes with Camera Motions" PDF

  The paper will appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012)


 -Collaborators :Dongreyol Lee and Dr. Irfan Essa

 Gaussian Process Regression Flow

 Modeling a trajectory as a continuous dense flow field from a sparse set of vector  sequences using Gaussian Process Regression. Meanflows and confidences (from  variances) allow for incrementally predicting possible paths and detecting  anomalous events from online trajectories. We evaluate various types of video data having complete and incomplete trajectories.

 - "Gaussian Process Regression Flow for Analysis of Motion Trajectories" PDF

  The paper was published in IEEE International Conference on Computer Vision



 -Collaborators :Dongreyol Lee and Dr. Irfan Essa

 Dynamic Scene Analysis using Motion Field

 Player actions and interactions in dynamic sports scenes are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that such constrained multi-agent events can be analyzed, and even predicted, by estimating the global movements of all players in the scene at any time and used to predict play evolution.

 - "Motion Fields to Predict Play Evolution in Dynamic Sports Scenes" PDF


 -Collaborators:Matthias Grundmann, Dr. Ariel Shamir, Dr. Iain Matthews, Dr. Jessica Hodgins and Dr. Irfan Essa

 Player Localization Using Multiple Static Cameras

 Modeling and analysis for the problem of fusing corresponding players' positional information as finding minimum weight K-length cycles in complete K- partite graphs. We use our proposed algorithm-class for an end-to-end sports visualization framework, and demonstrate its robustness by presenting results over 60,000 frames of real soccer footage captured over five different illumination conditions, play types, and team attire.

 - "Player Localization using Multple Static Cameras for Sports Visualization" PDF


 -Collaborators: Raffay Hamid, Ram Krishan Kumar, Matthias Grundmann, Dr. Jessica Hodgins and Dr. Irfan Essa

 Augmenting Earth-Maps with Dynamic Information

 Augmented Earth Maps visualize the live broadcast of dynamic sceneries within a city. We propose different approaches to analyze videos of pedestrians and cars, under differing conditions and then augment Aerial Earth Maps (AEMs) with live and dynamic information. We also analyze natural phenomenon (clouds) and project information from these to the AEMs to add the visual reality.PROJECT HOMEPAGE

 - Journal of Virtual Reality Springer 2011    [PDF]

 - IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2009    [PDF](TBA)  - Presentation [PPT]

 - Media coverage : CNN, New Scientist, Popular Science, Discovery Channel,    Technology Review (MIT), Engadget, Vizworld, Revolution Magazine, etc.

 -Collaborator : Dr. Irfan Essa , Dr. Sangmin Oh and Jeonggyu Lee

 Real-time Transparent-Colored Shadow

 We provide a general non-manifold meshes and an additional extension to  shadows of transparent casters. We first introduce a generalization of an objectí»s  silhouette to non-manifold meshes. By using this generalization, we can compute  the number of caster surfaces between the light and receiver, and furthermore, we  can compute the light intensity arrived at the receiver fragments after the light has  traveled through multiple colored transparent receiver surfaces.


 - Journal of Graphics Tools(JGT) 2008 [PDF]

 - Technical Report in GT-IC-07-04 [PDF]

 - GT-CMU Graphics retreat 2007 [PDF] 

 - Collaborator : Dr. Byungmoon Kim, Dr. Greg Turk

 GPS-Ray: Reconstruction of Urban Scene using GPS

  The main idea of this research is started from the assumption that the change of   SNR in gps receiver at certain location could discriminate obstruction/un-obstruction   structure. Using this evidence, we can localize and reconstruct building structures by   only using off-the-shelf GPS receivers.


- IEEE ISWC 2008 : [PDF]  - Presentation: [PPT]

- Gatech Technical Report GT-IC-08-06 : [PDF]

 SNR test(Test 20070602) :  and Heightmap/EM(Test 20070621)

  -Collaborator : Dr. Jay summet, Dr. Thad Starner , Dr.Irfan Essa

 Video based Non-Photorealistic Rendering

  Making Non-photorealistic Rendering(NPR) system using global gradient field from   Radiail Basis interpolation and dispersion filters (water-colorization). For temporal   coherence we adopt Michael Black's piecewise-smooth flow fields( robust   regularization). Dispersion filter is also designed for mimicing pigment dispersion   on the water fluid.


 Project for Samsung STAR/SAIT 2008

 [PDF] [Video]

  -Collaborator : Dr. Irfan Essa

 Multi-scale Mosaic Generation for Video navigation

  Making mosaic using Labeled multi-scale tiling algorithm. The mosaic  enables the  users to navigate easily and remix the video scenes for  their convenience. In  matching process we used the annotated information from Family Video

 Archive's(Aware home) xml  architecture. This work is presented at Living Game  World Symposium 2006 and Also appeared in ACM Multimedia 2006.


 - ACM International Conference on Multimedia 2006  [PDF] 

- Project page

- Previous version of KMosaic

 - Computational Photography project link

  -Collaborator : Dr. Irfan Essa, Dr. Gregory Abowd

 Face Recognition using GSVD

   Making Face recognition application with GSVD. we used Linear Discriminant    Analysis with Generalized Singular Vector Decomposition which reduces    dimension of input data image.

   ( k-1 dimension where k is number of class at training step )

    Application is developed under OpenCV and Visual C++ environment.


  - Face Recognition using LDA with generalized SVD

 -Collaborator : Sangmin Lee, Dr. James M. Rehg , Dr. Haesun Park

 Real-time Face Detection

 - Implementation of two well-known face detection algorithm.

   The link below contains the explanation of the algorithm,source code and    executable binary files.

   Face Detection by Viola-Jones and Morphological operator page link   


  If you want to use more reliable viola-Jones' module check this link

  Intel OpenCV's Viola-Jones face-detector link it is easy to use :)

- Short cut link for demo movie

  Morphological operator detection demo movie avi

  Viola-Jones detection demo movie avi


Research and Development at Samsung SDS IT R&D Center



 Face Recognition System

 Samsung IT R&D Center made a 'Face Recognition System' in  2002  after 3 years of research . It was a proto-type system, named  'ViaFace'.  Later, it was used in  various field and industry including well known Korean apartment  franchise  'Raemian' and Some Mexico airport etc.  This System consists of 2  types, Verification (one to one), Identification (one to many or Survailance). 


 - More details..

 - Presented at Comdex 2001 Las vegas


 Real-time Collaboration System : Synbiz

 Syncbiz is a real-time collaboration system, which includes Application sharing module, Text chatting module, Video/Audio   conferencing module, Shared virtual directory module, Multiuser White board module and Realtime Agenda Mgr(Scheduler) module.  One session permits 10 concurrent users and all users share each modules. And One syncbiz local server has a capacity to sustain 50   concurrent  sessions under main server which controls 50-capacity local server.


 - More details..

 - Samsung IT R&D Best eSolution award 2003

 - Witzwell introduction page

 - Syncbiz introduction page in Romanian
  (Thanks to Alexandra Seremina in Novosibrirsk State Univ.)

 -Collaborator : Taesoo Jun, Yongho Woo and Joonsung Park

 IP-STB Framework : LivingWise CS

  LWCS is an framework for IP Set top box made by Samsung Electronics and KT  (Korea Telecommunication). It managed  overall I/O and controllers and has  poweful  applications built on Microsoft Windows CE environment.

 It contains MP3 and media player for IPSTB, Realtime news feed and weather  forecasting by RSS.


 - More details..

  -Collaborator : Taesoo Jun and Joonsung Park


Miscellaneous research


 Illumination Subspace

 Reconstructing images with arbitrary lighting condition using at least three various  directions of light-source. [PDF]

 Building Recognition with SIFT features

 Building recognition by classifying features from Scale Invariant Feature(SIFT)  Detection  [PDF]

 Quick-time VR

  Making Quick-time VR movie using Sift ,Warping and RANSAC Algorithms.

  In every face, 25 image warping is applied.

  To make homography,   RANSAC is applied to   fitting the matrix.

- Creating cubic VR Movie Part1

- Creating cubic VR Movie Part2 Homography

- Creating cubic VR Movie Part3 : Automatic fitting by RANSAC Algorithm