Research
Principal Collaborators
- Alex Smola,
Principal Researcher, Yahoo! Research, Santa Clara
- Arthur Gretton,
Lecturer, Gatsby Computational Unit, University College of London, London
- Bernhard Schölkopf,
Professor, Max-Planck Institute for Biological Cybernetics, Tübingen
- Carlos Guestrin,
Associate Professor, Carnegie Mellon University, Pittsburgh
- Eric Xing,
Associate Professor, Carnegie Mellon University, Pittsburgh
- Karsten Borgwardt,
Assistant Professor, Max-Planck Institute for Developmental Biology, Tübingen
- Kenji Fukumizu,
Professor, The Institute of Statistical Mathematics, Tokyo
Nonparametric Graphical Models
Probabilistic graphical models are good tools
for representing structured dependencies between random
variables in challenging tasks in social networks,
natural language processing, computer vision, and beyond.
Most existing applications of graphical models are restricted to cases
where each random variable can take on only
a relatively small number of values, or, in continuous
domains, where the joint distributions are Gaussians.
I developed a novel nonparametric representation
for graphical models based on the concept of
kernel embeddings of distributions.
This new representation allows one to conduct learning and
inference in graphical model with much more general distributions.
Nonparametric graphical models have been applied to
various learning problems, such as cross-language document retrieval,
estimating depth from a single image, classification and forecast for
dynamical system models of video, speech and sensor time series.
In these applications, this new method outperforms state-of-the-art techniques.
Modeling, Analyzing and Visualizing Networks
Much of the world's information has a relational structure and can be modelled mathematically as networks and graphs. Examples include biological networks, webgraphs and social networks. Many of these large and complex networks exhibit rich spatial and temporal phenomena. Traditional graph modeling, analysis and visualization algorithms are not able to capture this complex spatial and temporal behavior. I designed new modeling, analyzing and visualizing tools to better understand complex networks.
Learning via Kernel Dependence
In this work, distributions are embedded into Hilbert spaces via expected feature map of a kernel, and then all subsequent operations on distributions are carried out in the Hilbert space.
This allows one to compute distances between distributions in
terms of distances between their embeddings. We have developed a framework of learning based on this which includes
density estimation, clustering, feature selection, two sample tests,
independence tests, nonparametric sorting, and dimensional reduction. A large number of existing methods appear in this framework as special cases.
Furthermore, this often leads to algorithms which are simpler and more effective than information theoretic methods in a broad range of applications.
Applications to Computational Biology and Other Sciences
I bring the state-of-the-art
statistical learning and modeling techniques to study complex data in real world applications and accelerate the
understanding of increasingly challenging modern science problems. For instance, in life science, the deluge of
inter-related genome-transcriptome-phenome data offers an unprecedented opportunity for statistical modelings
to explore questions such as how higher organism functions respond to molecular-level alterations. Clues to these
questions are essential to the understanding, diagnoses and treatments of complex disease such as asthma and
cancer. I developed methods to address problems such as selecting informative genes, understaning time varying gene regulatory networks, analyzing dynamic mental processes.