Kihwan Kim

   Senior Research Scientist at NVIDIA Research

   Ph.D. in Computer Science

   Georgia Institue of Technology, CoC/GVU/CPL  

    Advisor : Dr. Irfan Essa

   Member of  CPL and Graphics Group 

  Contact :

   email)  kihwan23 at   

   (nvresearch) kihwank at nvidia dot com    

    2701 San Tomas Expressway   

    Santa Clara, CA 95050

  I am currently at NVIDIA Research, my new page is here, however, I will keep maintaining this main page

Main research

 Curriculum Vitae [PDF] (updated Jan. 2015)

 Physically-based Rendering for Augmented Reality

  We propose a photo-realistic augmented and mixed reality system that runs in interactive rates. Our primary contribution is an axis-aligned filtering scheme that preserves the frequency content of the illumination. We then demonstrate a novel two-mode path tracing approach that allows ray-tracing a scene with image-based real geometry (captured from commodity depth camera) and mesh-based virtual geometry.

 - "Filtering Environment Illumination for Interactive Physically-Based Rendering in Mixed Reality " In Eurographics Symposium on Rendering (EGSR) 2015 PDF

 - Implementation details, derivations and proofs : PDF

  -Collaborators: Soham Mehta, Dawid Pajak , Kari Pulli, Jan Kautz, and Ravi Ramamoorthi

 3D CNN for Dynamic Hand Gesture Recognition

  We propose an algorithm for drivers' hand gesture recognition from challenging depth and intensity data using 3D convolutional neural networks. Our solution combines information from multiple spatial scales for the final prediction. It also employs spatio-temporal data augmentation for more effective training and to reduce potential overfitting.
Our method achieves a correct classification rate of 77.5% on the VIVA challenge dataset.

 - "Hand Gesture Recognition with 3D Convolutional Neural Networks " PDF

   IEEE CVPR 2015 Workshop on Hand gesture recognition

  -Collaborators: Pavlo Molchanov, Shalini Gupta , and Jan Kautz

 DNN-Gesture recognition with multi-modal sensors

  We propose a novel multi-sensor system for accurate and power-efficient dynamic car-driver hand-gesture recognition, using a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. We present a procedure to jointly calibrate the radar and depth sensors. We employ convolutional deep neural networks to fuse data from multiple sensors and to classify the gestures.

 - "Multi-sensor System for Driver's Hand-Gesture Recognition " PDF

   IEEE Automatic Face and Gesture Recognition (FG 2015) accepted as Oral

 - "Short-Range FMCW Monopulse Radar for Hand-Gesture Sensing " PDF

   IEEE International Radar conference 2015

  -Collaborators: Pavlo Molchanov, Shalini Gupta , and Kari Pulli

 DT-SLAM: SLAM with Deferred Triangulation

  We introduce a real-time visual SLAM system that incrementally tracks individual 2D features, and estimates camera pose by using matched 2D features, regardless of the length of the baseline. Triangulating 2D features into 3D points is deferred until keyframes with sufficient baseline for the features are available. Our method can also deal with pure rotational motions, and fuse the two types of measurements in a bundle adjustment step.

 - "DT-SLAM: Deferred Triangulation for Robust SLAM " PDF

   IEEE 3D Vision Conference (3DV 2014) at Tokyo Japan

  Source code (C++) of the system under BSD license is available : GITHUB

 -Collaborators : Daniel C. Herrera, and Kari Pulli

 WYSIWYG Computational Photography

  This paper explores the notion of viewfinder editing, which makes the viewfinder more accurately reflect the final image the user intends to create. We allow the user to alter the local or global appearance (tone, color, saturation, or focus) via stroke-based input, and propagate the edits spatiotemporally. The system then delivers a real-time visualization of these modifications to the user, and drives the camera control routines to select better capture parameters.

 - "WYSIWYG Computational Photography via Viewfinder Editing" PDF

  ACM Transactions on Graphics (SIGGRAPH Asia 2013) : PROJECT PAGE  

 -Collaborators :Jongmin Baek, Dawid Pajak , Kari Pulli, and Marc Levoy

 Prediction of ROI in Scenes with Camera Motions

 We use stochastic fields for predicting important future regions of interest as the scene evolves dynamically. We evaluate our approach on a variety of videos of team sports. We show that our approach can detect where to move the camera based on observations in the scene and compare the detected/predicted regions of interest to the camera motion as generated by actual camera operators

 - "Detecting Regions of Interest in Dynamic Scenes with Camera Motions" PDF

  The paper will appear in IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2012)


 -Collaborators :Dongreyol Lee and Dr. Irfan Essa

 Gaussian Process Regression Flow

 Modeling a trajectory as a continuous dense flow field from a sparse set of vector  sequences using Gaussian Process Regression. Meanflows and confidences (from  variances) allow for incrementally predicting possible paths and detecting  anomalous events from online trajectories. We evaluate various types of video data having complete and incomplete trajectories.

 - "Gaussian Process Regression Flow for Analysis of Motion Trajectories" PDF

  The paper was published in IEEE International Conference on Computer Vision



 -Collaborators :Dongreyol Lee and Dr. Irfan Essa

 Dynamic Scene Analysis using Motion Field

 Player actions and interactions in dynamic sports scenes are complex as they are driven by many factors, such as the short-term goals of the individual player, the overall team strategy, the rules of the sport, and the current context of the game. We show that such constrained multi-agent events can be analyzed, and even predicted, by estimating the global movements of all players in the scene at any time and used to predict play evolution.

 - "Motion Fields to Predict Play Evolution in Dynamic Sports Scenes" PDF


 -Collaborators:Matthias Grundmann, Dr. Ariel Shamir, Dr. Iain Matthews, Dr. Jessica Hodgins and Dr. Irfan Essa

 Player Localization Using Multiple Static Cameras

 Modeling and analysis for the problem of fusing corresponding players' positional information as finding minimum weight K-length cycles in complete K- partite graphs. We use our proposed algorithm-class for an end-to-end sports visualization framework, and demonstrate its robustness by presenting results over 60,000 frames of real soccer footage captured over five different illumination conditions, play types, and team attire.

 - "Player Localization using Multple Static Cameras for Sports Visualization" PDF


 -Collaborators: Raffay Hamid, Ram Krishan Kumar, Matthias Grundmann, Dr. Jessica Hodgins and Dr. Irfan Essa

 Augmenting Earth-Maps with Dynamic Information

 Augmented Earth Maps visualize the live broadcast of dynamic sceneries within a city. We propose different approaches to analyze videos of pedestrians and cars, under differing conditions and then augment Aerial Earth Maps (AEMs) with live and dynamic information. We also analyze natural phenomenon (clouds) and project information from these to the AEMs to add the visual reality.PROJECT HOMEPAGE

 - Journal of Virtual Reality Springer 2011    [PDF]

 - IEEE International Symposium on Mixed and Augmented Reality (ISMAR) 2009    [PDF](TBA)  - Presentation [PPT]

 - Media coverage : CNN, New Scientist, Popular Science, Discovery Channel,    Technology Review (MIT), Engadget, Vizworld, Revolution Magazine, etc.

 -Collaborator : Dr. Irfan Essa , Dr. Sangmin Oh and Jeonggyu Lee

 Real-time Transparent-Colored Shadow

 We provide a general non-manifold meshes and an additional extension to  shadows of transparent casters. We first introduce a generalization of an objectí»s  silhouette to non-manifold meshes. By using this generalization, we can compute  the number of caster surfaces between the light and receiver, and furthermore, we  can compute the light intensity arrived at the receiver fragments after the light has  traveled through multiple colored transparent receiver surfaces.


 - Journal of Graphics Tools(JGT) 2008 [PDF]

 - Technical Report in GT-IC-07-04 [PDF]

 - GT-CMU Graphics retreat 2007 [PDF] 

 - Collaborator : Dr. Byungmoon Kim, Dr. Greg Turk

 GPS-Ray: Reconstruction of Urban Scene using GPS

  The main idea of this research is started from the assumption that the change of   SNR in gps receiver at certain location could discriminate obstruction/un-obstruction   structure. Using this evidence, we can localize and reconstruct building structures by   only using off-the-shelf GPS receivers.


- IEEE ISWC 2008 : [PDF]  - Presentation: [PPT]

- Gatech Technical Report GT-IC-08-06 : [PDF]

 SNR test(Test 20070602) :  and Heightmap/EM(Test 20070621)

  -Collaborator : Dr. Jay summet, Dr. Thad Starner , Dr.Irfan Essa

 Video based Non-Photorealistic Rendering

  Making Non-photorealistic Rendering(NPR) system using global gradient field from   Radiail Basis interpolation and dispersion filters (water-colorization). For temporal   coherence we adopt Michael Black's piecewise-smooth flow fields( robust   regularization). Dispersion filter is also designed for mimicing pigment dispersion   on the water fluid.


 Project for Samsung STAR/SAIT 2008

 [PDF] [Video]

  -Collaborator : Dr. Irfan Essa

 Multi-scale Mosaic Generation for Video navigation

  Making mosaic using Labeled multi-scale tiling algorithm. The mosaic  enables the  users to navigate easily and remix the video scenes for  their convenience. In  matching process we used the annotated information from Family Video

 Archive's(Aware home) xml  architecture. This work is presented at Living Game  World Symposium 2006 and Also appeared in ACM Multimedia 2006.


 - ACM International Conference on Multimedia 2006  [PDF] 

- Project page

- Previous version of KMosaic

 - Computational Photography project link

  -Collaborator : Dr. Irfan Essa, Dr. Gregory Abowd

 Face Recognition using GSVD

   Making Face recognition application with GSVD. we used Linear Discriminant    Analysis with Generalized Singular Vector Decomposition which reduces    dimension of input data image.

   ( k-1 dimension where k is number of class at training step )

    Application is developed under OpenCV and Visual C++ environment.


  - Face Recognition using LDA with generalized SVD

 -Collaborator : Sangmin Lee, Dr. James M. Rehg , Dr. Haesun Park

 Real-time Face Detection

 - Implementation of two well-known face detection algorithm.

   The link below contains the explanation of the algorithm,source code and    executable binary files.

   Face Detection by Viola-Jones and Morphological operator page link   


  If you want to use more reliable viola-Jones' module check this link

  Intel OpenCV's Viola-Jones face-detector link it is easy to use :)

- Short cut link for demo movie

  Morphological operator detection demo movie avi

  Viola-Jones detection demo movie avi


Research and Development at Samsung SDS IT R&D Center



 Face Recognition System

 Samsung IT R&D Center made a 'Face Recognition System' in  2002  after 3 years of research . It was a proto-type system, named  'ViaFace'.  Later, it was used in  various field and industry including well known Korean apartment  franchise  'Raemian' and Some Mexico airport etc.  This System consists of 2  types, Verification (one to one), Identification (one to many or Survailance). 


 - More details..

 - Presented at Comdex 2001 Las vegas


 Real-time Collaboration System : Synbiz

 Syncbiz is a real-time collaboration system, which includes Application sharing module, Text chatting module, Video/Audio   conferencing module, Shared virtual directory module, Multiuser White board module and Realtime Agenda Mgr(Scheduler) module.  One session permits 10 concurrent users and all users share each modules. And One syncbiz local server has a capacity to sustain 50   concurrent  sessions under main server which controls 50-capacity local server.


 - More details..

 - Samsung IT R&D Best eSolution award 2003

 - Witzwell introduction page

 - Syncbiz introduction page in Romanian
  (Thanks to Alexandra Seremina in Novosibrirsk State Univ.)

 -Collaborator : Taesoo Jun, Yongho Woo and Joonsung Park

 IP-STB Framework : LivingWise CS

  LWCS is an framework for IP Set top box made by Samsung Electronics and KT  (Korea Telecommunication). It managed  overall I/O and controllers and has  poweful  applications built on Microsoft Windows CE environment.

 It contains MP3 and media player for IPSTB, Realtime news feed and weather  forecasting by RSS.


 - More details..

  -Collaborator : Taesoo Jun and Joonsung Park


Miscellaneous research


 Illumination Subspace

 Reconstructing images with arbitrary lighting condition using at least three various  directions of light-source. [PDF]

 Building Recognition with SIFT features

 Building recognition by classifying features from Scale Invariant Feature(SIFT)  Detection  [PDF]

 Quick-time VR

  Making Quick-time VR movie using Sift ,Warping and RANSAC Algorithms.

  In every face, 25 image warping is applied.

  To make homography,   RANSAC is applied to   fitting the matrix.

- Creating cubic VR Movie Part1

- Creating cubic VR Movie Part2 Homography

- Creating cubic VR Movie Part3 : Automatic fitting by RANSAC Algorithm