Dong Ryeol Lee Curriculum vitae Contact Information: School of Computational Science and Engineering Georgia Institute of Technology E-mail: dongryel@cc.gatech.edu 266 Ferst Drive WWW: www.cc.gatech.edu/~dongryel Atlanta, GA 30332 USA Objective: Full-time position that allows for advanced research in data analysis and scientific com- puting with a particular focus on practical large-scale implementations. Citizenship: USA Research Interests: Non-parametric statistics, kernel methods, optimization, numerical linear algebra, parallel computation, heterogeneous computing. Education: Georgia Institute of Technology, Atlanta, Georgia USA. August 2005 -- Present Ph.D., Computer Science, Expected May 2012. GPA: 3.85 (4.0 scale) * Minor: Optimization and Statistics. * Thesis: A Distributed Kernel Summation Framework for Machine Learning and Scientific Simulations. * Adviser: Professor Alexander G. Gray. * Area of Study: Machine learning, high-performance computing, numerical linear algebra. M.S., Mathematics, May 2011 * Area of Study: Numerical methods for PDE, linear algebra, and geometry. Carnegie Mellon University, Pittsburgh, Pennsylvania USA. August 2001 -- May 2005 B.S., Computer Science, May 2005 GPA: 3.87 (4.0 scale) * Graduation with university and college honors. B.S., Mathematical Sciences, May 2005 Tenafly High School, Tenafly, New Jersey USA. August 1997 -- May 2001 Graduated 3rd out of 237 students with highest honors. Submitted Journal Publications: [JCP 2011] D. Lee, A. Ozakin, and A. G. Gray. Multibody Multipole Methods. Journal of Computational Physics. 2011. Submitted. [SISC 2011] D. Lee, A. G. Gray, and A. W. Moore. Dual-Tree Fast Gauss Transforms. SIAM Journal on Scientific Computing. 2011. Submitted. Book Chapters: [AMLDMA 2012] W. March, A. Ozakin, D. Lee, R. Riegel, and A. G. Gray. Multi-Tree Algorithms for Large-Scale Astrostatistics. In: Advances in Machine Learning and Data Mining for Astronomy, Chapman & Hall/CRC Press, 2012. Conference Publications: [SDM 2012B] P. Ram, D. Lee, and A. G. Gray. Nearest-Neighbor Search on a Time Budget via Max-Margin Trees. In: SIAM International Conference on Data Mining, 2012. Proposed max-margin-based hierarchical data structure for improving efficiency of nearest neighbor search under a time-constrained setting. [SDM 2012A] D. Lee, R. Vuduc, and A. G. Gray. A Distributed Kernel Summation Framework for General-Dimension Machine Learning. In: SIAM Interna- tional Conference on Data Mining, 2012, Best Paper Award (1 out of 363 submissions, 1 out of 99 accepted submissions). A general parallel framework utilizing MPI/OpenMP for computing kernel summa- tion operations ubiquitous in many machine learning methods, including kernel density estimation, kernel regression, kernel SVM, kernel PCA, kernel density estimation, Gaussian process regression. Parallel construction of multi-dimensional trees. Utilized up to 6,144 cores at NERSC for computing density estimates on a subset of the Sloan Digital Sky Survey dataset. [ICCV 2011] K. Kim, D. Lee, and I. Essa. Gaussian Process Regression Flow for Anal- ysis of Motion Trajectories. In: Proceedings of 2011 IEEE International Conference on Computer Vision, 2011. Proposed a representation for modelling and matching motion trajectories based on Gaussian processes. [NIPS 2009B] P. Ram, D. Lee, W. March, and A. G. Gray. Linear-time Algorithms for Pairwise Statistical Problems. In: Advances in Neural Information Processing Systems, 2009, selected for a poster spotlight (top 8 %). Proof of linear-time complexity for machine learning methods involving pairwise distances. [NIPS 2009A] P. Ram, D. Lee, H. Ouyang, and A. G. Gray. Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions. In: Advances in Neural Information Processing Systems, 2009. Proposed a new notion of probabilistically controlling the rank quality of a nearest neighbor search operation. [NIPS 2008] D. Lee and A. G. Gray. Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method. In: Advances in Neural Information Processing Systems, 2008. Proposed two new extensions to FMMfor handling high-dimensional kernel sums: 1) a Monte Carlo-based probabilistic approximation; 2) a new hierarchical data structure for recording subspaces. [AISTATS 2007] P. Wang, D. Lee, A. G. Gray, and J. M. Rehg. Fast Mean Shift with Accurate and Stable Convergence. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, 2007. Developed a fast algorithm for non-parametric clustering based on an earlier work. [UAI 2006] D. Lee and A. G. Gray. Faster Gaussian Summation: Theory and Experi- ment. In: Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence, 2006. The second derivation of the hierarchical Gaussian kernel summation using a different type of series expansion. [NIPS 2005] D. Lee and A. G. Gray. Dual-Tree Fast Gauss Transforms. In: Advances in Neural Information Processing Systems, 2005. The first derivation of FMM-like algorithm for the Gaussian kernel. Corrected theorems published in earlier literature. Other Publications: [BS THESIS 2005] D. Lee. New Algorithmic Techniques for Generalized N-body Problems. Undergraduate Senior Honors Thesis, Carnegie Mellon University, 2005. Professional Experience: Georgia Institute of Technology, Atlanta, Georgia USA. Research Assistant at FASTLab August 2005 -- Present (http://fast-lab.org). Analytics 1305, Atlanta, Georgia USA. Co-founder and Senior Developer February 2009 -- August 2010 * Worked part-time while maintaining the Ph.D. studies. * Implemented a scalable analytics back-end software providing state-of-the-art statistical modeling and optimization algorithms. * Provided documentation support for clients including LogicBlox/Predictix LLC. * Deployed machine learning algorithms on Amazon EC2 Cloud, including nearest- neighbor, k-means, QuicSVD, non-negative matrix factorization, kernel density estimation, LASSO, kernel regression, linear regression, kernel SVM, kernel PCA, maximum variance unfolding, decision/regression trees, logistic regres- sion, orthogonal range search. Lawrence Berkeley National Laboratory, Berkeley, California USA Intern for KDD workflow project May 2007 -- August 2007 * Implemented workflow for allowing analysts for defining a set of searches, filters, and refinement operations to be performed. Teaching Experience: Carnegie Mellon University, Pittsburgh, Pennsylvania USA Grader for Department of Mathematical Sciences August 2004 -- December 2004 Grader for 21-355 Principles of Real Analysis I. Teaching Assistant for School of Computer Science January 2004 -- May 2004 Held office hours for an introductory programming course (15-113 System Skills in C) and graded assignments. Carnegie Mellon University Academic Development August 2002 -- May 2005 * Tutored introductory/advanced courses in mathematics and computer science. * College Reading & Learning Association Level 3 Master certification (http://www.crla.net/tutorcert.htm). Service: Maintaining the open-source software project, MLPACK: * C++ implementation of fast multipole methods for statistical computations. * CMake-based build system. * Archived at http://mloss.org/software/view/152/ * Demonstration at NIPS 2008 (http://nips.cc/Conferences/2008/Program/event.php?ID=1479). Awards: U.S. Department of Homeland Security * Graduate Fellowship, August 2006 -- August 2009. Carnegie Mellon University * School of Computer Science Dean's List for 6 out of 8 semesters, Fall 2001 -- Spring 2005. * National Society of Collegiate Scholars inductee, Fall 2002. * Phi Beta Kappa inductee, Spring 2005. * Phi Kappa Phi inductee, Spring 2005. * Senior Leadership Award, Spring 2005. * University Scholarship, Fall 2001 -- Spring 2005. Tenafly High School * National Merit Scholar Finalist. * Advanced Placement Scholar. Hardware and Computer Programming: Software Skills * C, C++, Intel Assembly, Java, Python, Standard ML, Scheme, HTML, and others. Development Environment and Tools: * Visual Studio, GNU Makefile/CMake build system * Performance profilers such as gprof, TotalView, PAPI (Performance Application Pro- gramming Interface), CUDA Visual Profiler. * Debuggers such as GNU Debugger, Valgrind, Electric Fence. * OpenCV, Boost Library, Python wrappers, Hadoop. Scientific Computing: * Trilinos, BLAS/LAPACK, Matlab, Mathematica. * Parallel programming experience with OpenMP, MPI, CUDA, Intel Thread Building Block; experience utilizing national resources such as XSEDE and NERSC. Version Control and Software Configuration Management: * VCS (CVS, SVN), and others Languages English (fluent), Korean (fluent). References Available upon request.