Codes & Data
ML Group


Principal Collaborators

Nonparametric Graphical Models

Probabilistic graphical models are good tools for representing structured dependencies between random variables in challenging tasks in social networks, natural language processing, computer vision, and beyond. Most existing applications of graphical models are restricted to cases where each random variable can take on only a relatively small number of values, or, in continuous domains, where the joint distributions are Gaussians.

I developed a novel nonparametric representation for graphical models based on the concept of kernel embeddings of distributions. This new representation allows one to conduct learning and inference in graphical model with much more general distributions.

Nonparametric graphical models have been applied to various learning problems, such as cross-language document retrieval, estimating depth from a single image, classification and forecast for dynamical system models of video, speech and sensor time series. In these applications, this new method outperforms state-of-the-art techniques.

Modeling, Analyzing and Visualizing Networks and Spatial/Temporal Dynamics

Much of the world's information has a relational structure and can be modelled mathematically as networks and graphs. Examples include biological networks, webgraphs and social networks. Many of these large and complex networks exhibit rich spatial and temporal phenomena. Traditional graph modeling, analysis and visualization algorithms are not able to capture this complex spatial and temporal behavior. I designed new modeling, analyzing and visualizing tools to better understand complex networks.

Dynamic processes occur in social, biological, biomedical and sustainability contexts over networks, space and time. I am interested in modeling and analyzing problems and data arising from these contexts. For instance, I have been studying information diffusion in social networks using continuous-time diffusion models, and developing methods to control or steer dynamics of social events based on these models.

Learning via Kernel Embedding of Distributions

In this work, distributions are embedded into Hilbert spaces via expected feature map of a kernel, and then all subsequent operations on distributions are carried out in the Hilbert space. This allows one to compute distances between distributions in terms of distances between their embeddings. We have developed a framework of learning based on this which includes density estimation, clustering, feature selection, two sample tests, independence tests, nonparametric sorting, and dimensional reduction. A large number of existing methods appear in this framework as special cases. Furthermore, this often leads to algorithms which are simpler and more effective than information theoretic methods in a broad range of applications.

Applications to Computational Biology and Other Sciences

I bring the state-of-the-art statistical learning and modeling techniques to study complex data in real world applications and accelerate the understanding of increasingly challenging modern science problems. For instance, in life science, the deluge of inter-related genome-transcriptome-phenome data offers an unprecedented opportunity for statistical modelings to explore questions such as how higher organism functions respond to molecular-level alterations. Clues to these questions are essential to the understanding, diagnoses and treatments of complex disease such as asthma and cancer. I developed methods to address problems such as selecting informative genes, understaning time varying gene regulatory networks, analyzing dynamic mental processes.