Following is a list of a few of the projects
underway with CPL. Most of the projects listed below have a lot of overlap
with each other.
Aware
Environments | DVFX | Activity
Recognition | Tracking | Nonphotorealism |Reconstruction
| Rendering | Segmentation | Speech
Aware Environments
- The
Aware Home
project is a an attempt at a building a large scale living laboratory that is
aware of its inhabitants and their activities.
- EClass
(formerly Classroom 2000): Capture your
whole classroom experience, a room that takes notes.
- Automated Understanding of Captured
Experiences project is aimed at
reducing substantially the human input necessary for creating and accessing
large collections of multimedia, particularly multimedia created by capturing
what is happening in an environment.
-
i. a. e.
Intelligent and Aware Environments (aka. S mart Spaces)
are spaces that have been transformed into smart work areas where color CCD
cameras, big screen displays, microphones, and other sensors are fused with
computers. Real-time analysis and tracking of lab inhabitants takes place. An
intelligent agent interfaces with lab dwellers and other devices in the room.
( Older Link )
- Ubiquitous Video and Audio
: The aim of this project is to explore and prototype a
spectrum of applications for ubiquitous video and audio processing in our
daily environments. We are developing methodologies to instrument spaces
with video and audio sensors and studying modeling techniques for
interpretation and analysis of signals that are captured by these sensors.
Digital Video Special Effects
Human Activity Recognition
- Segmentally Boosted HMM
(SBHMM) is a discriminative feature selection method for time sequence classification,
which predicts a single label for an entire sequence, including sign language recognition,
gait recognition, lip reading and speech recognition. Traditional feature selection methods
require the data being independently and identically distributed (i.i.d.), but the frames
in time sequences are usually correlated. Furthermore, features in time sequences may be "sometimes informative",
that is, discriminative only in some segment of a sequence. SBHMMs is able to address the
problems of both temporal correlation and segmentally informative features by assuming "piecewise i.i.d."
Experiments show that the SBHMMs consistently reduces the error of traditional HMM recognition algorithms.
- ObjectSpaces
is a vision-based methodology for detecting and recognizing
physical interactions between a person and objects in the surroundings. Human
action recognition as well as object classification is performed using an
object-oriented framework. The goal is to make computers aware of people and
their activities. This research has applications in automatic video annotation
and surveillance as well as in embedded environments.
- Expectation Grammars
represent high-level expectations about an activity in the
form of a parameterized stochastic grammar. Such grammars are used to
recognize an activity with strong temporal constraints and decompose the
activity into its constituent sub-tasks.
-
Propagation Net is a new modeling
tool to represent and recognize parallel temporal sequences. It extracts
temporal information as well as logical relation between events from high
level knowledge and uses these constraints to integrate noisy low level
evidences. Experiments on real world glucose meter calibration process
demonstrate its power.
People Tracking and Modeling
- Segmentally Boosted HMM
(SBHMM) is a discriminative feature selection method for time sequence classification,
which predicts a single label for an entire sequence, including sign language recognition,
gait recognition, lip reading and speech recognition. Traditional feature selection methods
require the data being independently and identically distributed (i.i.d.), but the frames
in time sequences are usually correlated. Furthermore, features in time sequences may be "sometimes informative",
that is, discriminative only in some segment of a sequence. SBHMMs is able to address the
problems of both temporal correlation and segmentally informative features by assuming "piecewise i.i.d."
Experiments show that the SBHMMs consistently reduces the error of traditional HMM recognition algorithms.
- Animated
Speakers project is
aimed at analysis and synthesis of facial motions associated with Speech.
- Human Identification at a Distance
, A project aimed at recognizing people at a distance..
- M odel-based Head Tracking
: A Robust method for tracking heads from video is
developed. In this method the head is tracked by finding the six
translation and rotation parameters to register a rendered image of the
textured model with the head in the video image.
- Appearance-based Tracking
of Head Yaw: A real-time system for
tracking head yaw (rotation with respect the vertical axis) by interpolation
of template responses.
-
Pupil Detection and Tracking
: Reliable pupil detection and tracking are integral towards
more attentive user interfaces. The physiological properties, dynamics, and
appearance of pupils are used to robustly find and track them. Once pupils are
found, they can be used to extract higher level information, such as faces,
from the scene. We are currently looking at what other higher-level
information can be determined about users using this technique as a
foundation.
-
Perceptual User Interfaces using
Vision-based Eye
Tracking :
Head pose and eye gaze information are very valuable cues in face-to-face
interactions between people. We propose enhancements to various vision-based
eye-tracking approaches, which include (a) the use of multiple cameras to
estimate head pose and increase coverage of the sensors and (b) the use of
probabilistic measures incorporating Fisher's linear discriminant to robustly
track the eyes under varying lighting conditions in real-time.
-
Tracking Multiple
Objects through Occlusions: Tracking
varying number of objects through both temporally and spatially significant
occlusions. Our method builds on the idea
of object permanence to reason about occlusion. To this end, tracking is
performed at both the region level and the object level. At the region level,
a customized Genetic Algorithm is used to search for optimal region tracks.
This limits the scope of object trajectories. At the object level, each object
is located based on adaptive appearance models, spatial distributions and
inter-occlusion relationships. The proposed architecture is capable of
tracking objects even in the presence of long periods of full occlusions.
- A Modular Approach
to the Analysis and Evaluation of Particle Filters for Figure Tracking
We present the first systematic empirical study of the particle filter (PF) algorithms for human figure tracking in
video. Our analysis and evaluation follows a modular approach which is based upon the underlying
statistical principles and computational concerns that govern the performance of PF
algorithms. Based on our analysis, we propose a novel PF algorithm for figure tracking with superior performance called the Optimized
Unscented PF. We examine the role of edge and template features, introduce computationally-equivalent sample sets, and describe a
method for the automatic acquisition of reference data using standard motion
capture hardware. The software and test data are publicly-available on our project website.
Nonphotorealism
Multimedia Editing and Synthesis
-
Content Based Image Synthesis:
An application for editing images, which synthesizes new
regions of the image based on a database of annotated image regions. High and
low level annotations in the database guide the user to regions which would
fit in the new image.
- Photo Collage Authoring: An application and novel interface for structuring and authoring photo collages based on their associated metadata.
Image/Video-based Modeling,
Rendering , Animation
- Graphcut Textures: A
patch-based method for image and video texture synthesis. An algorithm for
finding optimal seams between input and output texture (for an automatically
computed texture placement/offset) is developed. An interactive application
for image compositing is also demonstrated.
- Image-based Method for
G enerating Motion
Blur . An approach that allows for generating motion blur as a
video-based post-process. Applications shown for blurring stop-motion footage
and for real video.
-
Learning video processing by example
: We approximate the output of an arbitrary video processing algorithm based
on a pair of input and output exemplars. Our algorithm relies on learning the
mapping between the input and output exemplars to model the processing that
has taken place.
-
Rendering S kin : Fine scale
skin structure has traditionally been ignored when rendering human skin. This
technique adds the detail that is missing, and increases its photo-realism
significantly.
- Video
Textures : Develop a new pictorial medium that is in-between
photographs (static) and video (with a well-defined begin and an end).
Video textures allows generation of infinitely long videos from a finite set
of frames. This technique can be used for video-based rendering and
video-based animation.
-
Texture Optimization:
A global optimization-based technique for texture synthesis is developed which
combines local neighborhood-based similarity measures into a global metric.
Controllable synthesis is also demonstrating by animating textures guided by
flow fields.
Audio-Visual Speech Recognition
-
Speech Reading: Using visual clues to assist
acoustic speech recognition.
- Segmentally Boosted
HMM
(SBHMM) is a discriminative feature selection method for time sequence classification,
which predicts a single label for an entire sequence, including sign language recognition,
gait recognition, lip reading and speech recognition. Traditional feature selection methods
require the data being independently and identically distributed (i.i.d.), but the frames
in time sequences are usually correlated. Furthermore, features in time sequences may be "sometimes informative",
that is, discriminative only in some segment of a sequence. SBHMMs is able to address the
problems of both temporal correlation and segmentally informative features by assuming "piecewise i.i.d."
Experiments show that the SBHMMs consistently reduces the error of traditional HMM recognition algorithms.
Complete List of Current / Ongoing
-
A
nimated Speakers
project is aimed at analysis and synthesis of facial
motions associated with Speech. A
utomated Understanding of
Captured E
xperiences project is aimed at reducing
substantially the human input necessary for creating and accessing large
collections of multimedia, particularly multimedia created by capturing
what is happening in an environment. The
Aware
H ome
project is a an attempt at a building a large scale living
laboratory that is aware of its inhabitants and their activities.
Digital V
ideo Special
Effects (DVFX):
Combining Video, Graphics, and Computer Vision to
generate Digital FX.
EC
lass
(formerly Classroom
2000):
Capture your whole classroom experience, a room that takes
notes. H
uman Identification at a
Distance
, A project aimed at recognizing people at a distance..
I
mage-based Method
for G enerating
Motion Blur
. An approach that allows for generating motion blur as a
video-based post-process. Applications shown for blurring stop-motion
footage and for real video.
i. a. e. I
ntelligent and Aware
Environments
(aka. S mart
Spaces) are spaces that have
been transformed into smart work areas where color CCD cameras, big screen
displays, microphones, and other sensors are fused with computers.
Real-time analysis and tracking of lab inhabitants takes place. An
intelligent agent interfaces with lab dwellers and other devices in the
room. ( Older Link )
L ayering:
Motion Based Decompositing of Video. Motion reveals which objects are in
front and which are in back.
Learning video processing by
example : We approximate the output of an
arbitrary video processing algorithm based on a pair of input and output
exemplars. Our algorithm relies on learning the mapping between the input
and output exemplars to model the processing that has taken place.
M
odel-based Head
Tracking
: A Robust method for tracking heads from video is developed. In
this method the head is tracked by finding the six translation and
rotation parameters to register a rendered image of the textured model
with the head in the video image.
M ulti-dimensional
Texture Synthesis
using G raph
Cuts : A
patch-based method for texture synthesis in multiple dimensions. An
algorithm for finding optimal seams between input and output texture (for
a given texture placement/offset) is developed. Our technique allows
iterative refinement of already generated textures.
-
O bjectSpaces
is a vision-based methodology for detecting and
recognizing physical interactions between a person and objects in the
surroundings. Human action recognition as well as object classification is
performed using an object-oriented framework. The goal is to make
computers aware of people and their activities. This research has
applications in automatic video annotation and surveillance as well as in
embedded environments.
-
Pupil D
etection and Tracking
: Reliable pupil detection and tracking are integral
towards more attentive user interfaces. The physiological properties,
dynamics, and appearance of pupils are used to robustly find and track
them. Once pupils are found, they can be used to extract higher level
information, such as faces, from the scene. We are currently looking at
what other higher-level information can be determined about users using
this technique as a foundation.
Perceptual U
ser Interfaces using
Vision-based
Eye Tracking
: Head pose and eye gaze information are very valuable
cues in face-to-face interactions between people. We propose enhancements
to various vision-based eye-tracking approaches, which include (a) the use
of multiple cameras to estimate head pose and increase coverage of the
sensors and (b) the use of probabilistic measures incorporating Fisher's
linear discriminant to robustly track the eyes under varying lighting
conditions in real-time.
Rendering S
kin : Fine scale skin structure has
traditionally been ignored when rendering human skin. This technique adds
the detail that is missing, and increases its photo-realism significantly.
S
cene Reconstruction
from E xtended
Video Sequences
: We are developing efficient methods for Model/Scene
Reconstruction from Extended (long) Video Sequences.
Speech Reading: Using visual clues to assist
acoustic speech recognition. U
biquitous Video and
Audio :
The aim of this project is to explore and prototype a spectrum of
applications for ubiquitous video and audio processing in our daily
environments. We are developing methodologies to instrument spaces
with video and audio sensors and studying modeling techniques for
interpretation and analysis of signals that are captured by these sensors.
-
V ideoTextures
: Develop a new pictorial medium that is in-between
photographs (static) and video (with a well-defined begin and an end).
Videotextures allows generation of infinitely long videos from a finite
set of frames. This tecnhique can be used for video-based rendering and
video-based animation.
Past Projects.
-
B allet In
A Box
is HCI research in the domain of ballet. While good dance
instructors are masters in their own right, we are collaborating with the
Psychology department to build and evaluate a computer system that could
be used to teach dance. The system will provide users with views of each
dance motion from various angles and at different speeds, and is being
tested against video-instruction tapes and books.
DFacs :
A research project aimed at the recognition, modeling, and realistic
animation of facial expressions. Various methods for recognition of
facial expressions have been developed. At present our interest in
creating a database for testing our system and towards realistic animation
of facial expressions.
Euripides' T
he Bacchae
is a PTRL production of a Greek tragedy performed at the
Dramatech theater in March, 1998. As our first step toward applying our
research in the world of entertainment f/x, we provided parts of a
computer vision system, in the form of video-augmenting software, to
represent the world as seen from the eyes of a blind soothsayer.
PePe
(PErsonal
PEt)
project is a long-term research project to build intelligent, adaptive,
user-friendly agent that displays different emotional states and awide
range of behaviors. More specifically, we are trying to build an agent
that exhibits pet-like behaviors and emotions. Our goal is to make the
interaction between the user and the agent as natural as possible thus
making the user perceive the agent more as a friend or a pet than as a
toy.
-
Smart F
loor We are creating a system that identifies
and tracks people based on their footfalls. We are instrumenting a space
in order to track and identify multiple people across a large indoor area.
We are also integrating video identification and tracking technologies
into the project. Applications we are exploring include home activities,
art and dance performance, and entertainment.