Following
is a list of a few of the projects underway with CPL. Most of the
projects listed below have a lot of overlap with each other.
Aware Environments | DVFX
| Activity Recognition | Tracking
| Nonphotorealism |Reconstruction | Rendering | Segmentation | Speech
Aware Environments
- The Aware Home project is a an attempt at a
building a large scale living laboratory that is aware of its inhabitants
and their activities.
- EClass (formerly
Classroom 2000): Capture your whole classroom
experience, a room that takes notes.
- Automated Understanding of Captured Experiences project is aimed at reducing substantially the human
input necessary for creating and accessing large collections of
multimedia, particularly multimedia created by capturing what is happening
in an environment.
- i. a.
e. Intelligent and Aware Environments (aka. S mart Spaces) are spaces that have been
transformed into smart work areas where color CCD cameras, big screen
displays, microphones, and other sensors are fused with computers.
Real-time analysis and tracking of lab inhabitants takes place. An
intelligent agent interfaces with lab dwellers and other devices in the
room. ( Older Link )
- Ubiquitous Video and Audio : The aim of this project is
to explore and prototype a spectrum of applications for ubiquitous video
and audio processing in our daily environments. We are developing
methodologies to instrument spaces with video and audio sensors and
studying modeling techniques for interpretation and analysis of signals
that are captured by these sensors.
Digital Video Special Effects
Human Activity
Recognition
- Segmentally-Boosted HMM (SBHMM) is a discriminative
feature selection method for time sequence classification, which predicts
a single label for an entire sequence, including sign language
recognition, gait recognition, lip reading and speech recognition.
Traditional feature selection methods require the data being independently
and identically distributed (i.i.d.), but the frames in time sequences are
usually correlated. Furthermore, features in time sequences may be
"sometimes informative", that is, discriminative only in some
segment of a sequence. SBHMMs is able to address the problems of both
temporal correlation and segmentally-informative features by assuming
"piecewise i.i.d." Experiments show that the SBHMMs consistently
reduces the error of traditional HMM recognition algorithms.
- ObjectSpaces is a vision-based methodology
for detecting and recognizing physical interactions between a person and
objects in the surroundings. Human action recognition as well as object
classification is performed using an object-oriented framework. The goal
is to make computers aware of people and their activities. This research
has applications in automatic video annotation and surveillance as well as
in embedded environments.
- Expectation Grammars represent high-level expectations about an
activity in the form of a parameterized stochastic grammar. Such grammars
are used to recognize an activity with strong temporal constraints and
decompose the activity into its constituent sub-tasks.
- Propagation Net is a new modeling tool to represent and recognize
parallel temporal sequences. It extracts temporal information as well as
logical relation between events from high level knowledge and uses these
constraints to integrate noisy low level evidences. Experiments on real
world glucose meter calibration process demonstrate its power.
People Tracking and
Modeling
- Segmentally-Boosted HMM (SBHMM) is a discriminative feature
selection method for time sequence classification, which predicts a single
label for an entire sequence, including sign language recognition, gait
recognition, lip reading and speech recognition. Traditional feature
selection methods require the data being independently and identically
distributed (i.i.d.), but the frames in time sequences are usually
correlated. Furthermore, features in time sequences may be "sometimes
informative", that is, discriminative only in some segment of a sequence.
SBHMMs is able to address the problems of both temporal correlation and
segmentally-informative features by assuming "piecewise i.i.d."
Experiments show that the SBHMMs consistently reduces the error of
traditional HMM recognition algorithms.
- Animated
Speakers
project is aimed at analysis and synthesis of facial motions associated
with Speech.
- Human Identification at a Distance , A project aimed at
recognizing people at a distance..
- M odel-based Head Tracking : A Robust method for
tracking heads from video is developed. In this method the head is
tracked by finding the six translation and rotation parameters to
register a rendered image of the textured model with the head in the video
image.
- Appearance-based Tracking of Head Yaw: A real-time system for
tracking head yaw (rotation with respect the vertical axis) by
interpolation of template responses.
- Pupil Detection and Tracking : Reliable pupil detection
and tracking are integral towards more attentive user interfaces. The
physiological properties, dynamics, and appearance of pupils are used to
robustly find and track them. Once pupils are found, they can be used to
extract higher level information, such as faces, from the scene. We are
currently looking at what other higher-level information can be determined
about users using this technique as a foundation.
- Perceptual User Interfaces using Vision-based Eye
Tracking : Head pose and eye gaze
information are very valuable cues in face-to-face interactions between people.
We propose enhancements to various vision-based eye-tracking approaches,
which include (a) the use of multiple cameras to estimate head pose and
increase coverage of the sensors and (b) the use of probabilistic measures
incorporating Fisher's linear discriminant to robustly track the eyes
under varying lighting conditions in real-time.
- Tracking Multiple
Objects through Occlusions: Tracking varying number of objects through
both temporally and spatially significant occlusions. Our method builds on
the idea of object permanence to reason about occlusion. To this end,
tracking is performed at both the region level and the object level. At
the region level, a customized Genetic Algorithm is used to search for
optimal region tracks. This limits the scope of object trajectories. At
the object level, each object is located based on adaptive appearance
models, spatial distributions and inter-occlusion relationships. The
proposed architecture is capable of tracking objects even in the presence
of long periods of full occlusions.
- A Modular Approach
to the Analysis and Evaluation of Particle Filters for Figure Tracking
We present the first systematic empirical study of the particle filter
(PF) algorithms for human figure tracking in video. Our analysis and
evaluation follows a modular approach which is based upon the underlying
statistical principles and computational concerns that govern the
performance of PF algorithms. Based on our analysis, we propose a novel PF
algorithm for figure tracking with superior performance called the
Optimized Unscented PF. We examine the role of edge and template features,
introduce computationally-equivalent sample sets, and describe a method
for the automatic acquisition of reference data using standard motion
capture hardware. The software and test data are publicly-available on our
project website.
Nonphotorealism
Multimedia Editing and Synthesis
- Content Based
Image Synthesis: An application for editing images, which synthesizes new regions
of the image based on a database of annotated image regions. High and low
level annotations in the database guide the user to regions which would
fit in the new image.
- Photo Collage
Authoring: An
application and novel interface for structuring and authoring photo
collages based on their associated metadata.
Image/Video-based Modeling, Rendering , Animation
- Graphcut Textures:
A patch-based
method for image and video texture synthesis. An algorithm for finding
optimal seams between input and output texture (for an automatically
computed texture placement/offset) is developed. An interactive
application for image compositing is also demonstrated.
- Image-based
Method for G enerating Motion Blur . An approach that allows for
generating motion blur as a video-based post-process. Applications shown
for blurring stop-motion footage and for real video.
- Learning video processing by example :
We approximate the output of an arbitrary video processing algorithm based
on a pair of input and output exemplars. Our algorithm relies on learning
the mapping between the input and output exemplars to model the processing
that has taken place.
- Rendering S
kin : Fine scale skin structure has traditionally been ignored when
rendering human skin. This technique adds the detail that is missing, and
increases its photo-realism significantly.
- Video Textures : Develop a new pictorial
medium that is in-between photographs (static) and video (with a
well-defined begin and an end). Video textures allows generation of
infinitely long videos from a finite set of frames. This technique can be
used for video-based rendering and video-based animation.
- Texture Optimization:
A global optimization-based technique for texture synthesis is developed
which combines local neighborhood-based similarity measures into a global
metric. Controllable synthesis is also demonstrating by animating textures
guided by flow fields.
Audio-Visual Speech
Recognition
- Speech Reading: Using visual clues to
assist acoustic speech recognition.
- Segmentally-Boosted HMM
(SBHMM) is a
discriminative feature selection method for time sequence classification, which
predicts a single label for an entire sequence, including sign language
recognition, gait recognition, lip reading and speech recognition.
Traditional feature selection methods require the data being independently
and identically distributed (i.i.d.), but the frames in time sequences are
usually correlated. Furthermore, features in time sequences may be
"sometimes informative", that is, discriminative only in some
segment of a sequence. SBHMMs is able to address the problems of both
temporal correlation and segmentally-informative features by assuming
"piecewise i.i.d." Experiments show that the SBHMMs consistently
reduces the error of traditional HMM recognition algorithms.
Complete List of Current / Ongoing
- A nimated
Speakers project is aimed at
analysis and synthesis of facial motions associated with Speech. A utomated
Understanding of Captured E xperiences project is aimed at
reducing substantially the human input necessary for creating and
accessing large collections of multimedia, particularly multimedia created
by capturing what is happening in an environment. The Aware
H ome project is a an attempt at
a building a large scale living laboratory that is aware of its
inhabitants and their activities. Digital V
ideo Special Effects (DVFX): Combining Video, Graphics, and
Computer Vision to generate Digital FX. EC
lass (formerly Classroom 2000): Capture your whole classroom experience, a room that
takes notes. H uman Identification at a Distance , A project aimed at
recognizing people at a distance.. I mage-based
Method for G enerating Motion Blur . An approach that allows for
generating motion blur as a video-based post-process. Applications shown
for blurring stop-motion footage and for real video. i. a. e. I
ntelligent and Aware Environments (aka. S mart Spaces) are spaces that have
been transformed into smart work areas where color CCD cameras, big screen
displays, microphones, and other sensors are fused with computers.
Real-time analysis and tracking of lab inhabitants takes place. An
intelligent agent interfaces with lab dwellers and other devices in the
room. ( Older Link ) L ayering: Motion Based Decompositing
of Video. Motion reveals which objects are in front and which are in back.
Learning
video processing by example : We approximate the output of an
arbitrary video processing algorithm based on a pair of input and output
exemplars. Our algorithm relies on learning the mapping between the input
and output exemplars to model the processing that has taken place. M odel-based Head Tracking
: A Robust method for tracking heads from video is developed. In
this method the head is tracked by finding the six translation and rotation
parameters to register a rendered image of the textured model with the
head in the video image. M
ulti-dimensional Texture Synthesis using G raph Cuts : A
patch-based method for texture synthesis in multiple dimensions. An
algorithm for finding optimal seams between input and output texture (for
a given texture placement/offset) is developed. Our technique allows
iterative refinement of already generated textures.
- O
bjectSpaces is a vision-based
methodology for detecting and recognizing physical interactions between a
person and objects in the surroundings. Human action recognition as well
as object classification is performed using an object-oriented framework.
The goal is to make computers aware of people and their activities. This
research has applications in automatic video annotation and surveillance
as well as in embedded environments.
- Pupil D
etection and Tracking : Reliable
pupil detection and tracking are integral towards more attentive user
interfaces. The physiological properties, dynamics, and appearance of
pupils are used to robustly find and track them. Once pupils are found,
they can be used to extract higher level information, such as faces, from
the scene. We are currently looking at what other higher-level information
can be determined about users using this technique as a foundation.
Perceptual U
ser Interfaces using Vision-based Eye
Tracking : Head pose and eye
gaze information are very valuable cues in face-to-face interactions
between people. We propose enhancements to various vision-based
eye-tracking approaches, which include (a) the use of multiple cameras to
estimate head pose and increase coverage of the sensors and (b) the use of
probabilistic measures incorporating Fisher's linear discriminant to
robustly track the eyes under varying lighting conditions in real-time. Rendering S
kin : Fine scale skin structure has traditionally been ignored when
rendering human skin. This technique adds the detail that is missing, and
increases its photo-realism significantly. S cene
Reconstruction from E xtended Video
Sequences : We are developing
efficient methods for Model/Scene Reconstruction from Extended (long)
Video Sequences. Speech
Reading: Using visual clues to assist acoustic speech recognition.
U biquitous Video
and Audio : The aim of this
project is to explore and prototype a spectrum of applications for ubiquitous
video and audio processing in our daily environments. We are
developing methodologies to instrument spaces with video and audio sensors
and studying modeling techniques for interpretation and analysis of
signals that are captured by these sensors.
- V
ideoTextures : Develop a new
pictorial medium that is in-between photographs (static) and video (with a
well-defined begin and an end). Videotextures allows generation of
infinitely long videos from a finite set of frames. This tecnhique can be
used for video-based rendering and video-based animation.
Past Projects.
- B allet In A Box is HCI research in the domain of
ballet. While good dance instructors are masters in their own right, we
are collaborating with the Psychology department to build and evaluate a
computer system that could be used to teach dance. The system will provide
users with views of each dance motion from various angles and at different
speeds, and is being tested against video-instruction tapes and books.
DFacs : A research project aimed at
the recognition, modeling, and realistic animation of facial
expressions. Various methods for recognition of facial expressions
have been developed. At present our interest in creating a database
for testing our system and towards realistic animation of facial
expressions. Euripides'
T he Bacchae is a PTRL
production of a Greek tragedy performed at the Dramatech theater in March,
1998. As our first step toward applying our research in the world of
entertainment f/x, we provided parts of a computer vision system, in the
form of video-augmenting software, to represent the world as seen from the
eyes of a blind soothsayer. PePe
(PErsonal PEt) project is a long-term research project to
build intelligent, adaptive, user-friendly agent that displays different
emotional states and awide range of behaviors. More specifically, we are
trying to build an agent that exhibits pet-like behaviors and emotions.
Our goal is to make the interaction between the user and the agent as
natural as possible thus making the user perceive the agent more as a
friend or a pet than as a toy.
- Smart F
loor We are creating a system that identifies and tracks people based
on their footfalls. We are instrumenting a space in order to track and
identify multiple people across a large indoor area. We are also
integrating video identification and tracking technologies into the
project. Applications we are exploring include home activities, art and dance
performance, and entertainment.