Da Kuang

Postdoctoral Researcher
School of Computational Science and Engineering
College of Computing
Georgia Institute of Technology

Email:
(before @) da.kuang
(after @) cc.gatech.edu

CV [pdf]

(Olympic National Park, 2012)


Welcome!

I am a postdoctoral researcher in School of Computational Science and Engineering, Georgia Institute of Technology, working with Prof. Haesun Park. I am creating and co-teaching a new course CSE 6040 -- Computing for Data Analysis: Methods and Tools with Prof. Polo Chau.

My research area is numerical methods for large-scale machine learning. My research interests include numerical computing, machine learning, data analytics, numerical optimization, text mining, genomic analysis, and high performance computing.

I received my PhD degree in Computational Science and Engineering at Georgia Tech, advised by Prof. Haesun Park. My thesis topic is nonnegative matrix factorization (NMF) for clustering.

I created an algorithm based on hierarchical rank-2 NMF for large-scale topic modeling that is about 20 times faster than latent Dirichlet allocation and 100 times faster than the original NMF with comparative quality. The algorithm is now available as an open-source software called smallk.

Previously, I obtained my Bachelor degree in computer science at Tsinghua University in Beijing, China. I started my college years in the Department of Mathematics, and later transferred to the Department of Computer Science and joined Yao Class. I worked with Prof. Min Zhang on learning to rank algorithms for information retrieval.


News

2014.8 I will be co-teaching CSE 6040 -- Computing for Data Analysis: Methods and Tools for the new MS Analytics Program in Fall 2014.

2013.5 I will be joining eBay research lab as a research intern for this summer.

2012.5 I will be joining Amazon.com as a software engineer intern in the recommendations team for this summer.


Publications

Da Kuang, Jaegul Choo, and Haesun Park, Nonnegative matrix factorization for interactive topic modeling and document clustering (book chapter), in Partitional Clustering Algorithms, Springer, 2015.

Nicolas Gillis, Da Kuang, and Haesun Park, Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization, IEEE Transactions on Geoscience and Remote Sensing, 53(4):2066-2078, 2015. [Arxiv]

Da Kuang, Sangwoon Yun, and Haesun Park, SymNMF: Nonnegative low-rank approximation of a similarity matrix for graph clustering, Journal of Global Optimization, 2014.

Da Kuang, Nonnegative matrix factorization for clustering, PhD Dissertation, Georgia Institute of Technology, 2014. [pdf]

Da Kuang and Haesun Park, Fast rank-2 nonnegative matrix factorization for hierarchical document clustering, Proceedings of the 19th ACM SIGKDD International Conference on Knowledge, Discovery, and Data Mining (KDD '13), pp. 739-747, Chicago, IL, 2013. [pdf] [Matlab code] [C++ code on github]

Da Kuang, Chris Ding, and Haesun Park, Symmetric nonnegative matrix factorization for graph clustering, Proceedings of 2012 SIAM International Conference on Data Mining (SDM '12), pp. 106-117, Anaheim, CA, 2012. [pdf] [slides] [code]

Min Zhang, Da Kuang, Guichun Hua, Yiqun Liu, and Shaoping Ma, Is learning to rank effective for web search?, SIGIR 2009 Workshop on Learning to Rank for Information Retrieval, Boston, MA, 2009. [pdf]

Preprints

Da Kuang and Raffay Hamid, piCholesky: Polynomial Interpolation of Multiple Cholesky Factors for Efficient Approximate Cross-Validation. [Arxiv]


Software

High-performance NMF on github

Hierarchical Rank-2 NMF for document clustering and topic discovery

Symmetric NMF

kmeans3: Accelerating Matlab K-means with Simple Patches


Courses

Fall 2009:
CSE 6643 Numerical Linear Algebra
CSE 6740 Foundations of Machine Learning and Data Mining

Spring 2010:
MATH 6644 Iterative Methods for Systems of Equations
ISYE 6416 Computational Statistics

Fall 2010:
CSE 6140 Computational Science and Engineering Algorithms
CSE 6230 High Performance Parallel Computing

Spring 2011:
CSE 6220 High Performance Computing
CSE 8001 Solvers for Scientific Computations
BIOL 7111 Molecular Evolution
BIOL 4755 Introduction to Systems Biology


Matlab Tips

About the efficiency of growing and shrinking sparse matrices
Bug-fix for kmeans in the statistics toolbox


Last modified: Sep. 5, 2014