Aware Environments

Following is a list of a few of the projects underway with CPL. Most of the projects listed below have a lot of overlap with each other.

Aware Environments

The Aware Home project is a an attempt at a building a large scale living laboratory that is aware of its inhabitants and their activities.
EClass (formerly Classroom 2000): Capture your whole classroom experience, a room that takes notes.
Automated Understanding of Captured Experiences project is aimed at reducing substantially the human input necessary for creating and accessing large collections of multimedia, particularly multimedia created by capturing what is happening in an environment.
i. a. e. Intelligent and Aware Environments (aka. S mart Spaces) are spaces that have been transformed into smart work areas where color CCD cameras, big screen displays, microphones, and other sensors are fused with computers. Real-time analysis and tracking of lab inhabitants takes place. An intelligent agent interfaces with lab dwellers and other devices in the room. ( Older Link )
Ubiquitous Video and Audio : The aim of this project is to explore and prototype a spectrum of applications for ubiquitous video and audio processing in our daily environments. We are developing methodologies to instrument spaces with video and audio sensors and studying modeling techniques for interpretation and analysis of signals that are captured by these sensors.

Digital Video Special Effects

Digital Video Special Effects (DVFX): Combining Video, Graphics, and Computer Vision to generate Digital FX.

Human Activity Recognition

Segmentally Boosted HMM (SBHMM) is a discriminative feature selection method for time sequence classification, which predicts a single label for an entire sequence, including sign language recognition, gait recognition, lip reading and speech recognition. Traditional feature selection methods require the data being independently and identically distributed (i.i.d.), but the frames in time sequences are usually correlated. Furthermore, features in time sequences may be "sometimes informative", that is, discriminative only in some segment of a sequence. SBHMMs is able to address the problems of both temporal correlation and segmentally informative features by assuming "piecewise i.i.d." Experiments show that the SBHMMs consistently reduces the error of traditional HMM recognition algorithms.
ObjectSpaces is a vision-based methodology for detecting and recognizing physical interactions between a person and objects in the surroundings. Human action recognition as well as object classification is performed using an object-oriented framework. The goal is to make computers aware of people and their activities. This research has applications in automatic video annotation and surveillance as well as in embedded environments.
Expectation Grammars represent high-level expectations about an activity in the form of a parameterized stochastic grammar. Such grammars are used to recognize an activity with strong temporal constraints and decompose the activity into its constituent sub-tasks.
Propagation Net is a new modeling tool to represent and recognize parallel temporal sequences. It extracts temporal information as well as logical relation between events from high level knowledge and uses these constraints to integrate noisy low level evidences. Experiments on real world glucose meter calibration process demonstrate its power.

People Tracking and Modeling

Segmentally Boosted HMM (SBHMM) is a discriminative feature selection method for time sequence classification, which predicts a single label for an entire sequence, including sign language recognition, gait recognition, lip reading and speech recognition. Traditional feature selection methods require the data being independently and identically distributed (i.i.d.), but the frames in time sequences are usually correlated. Furthermore, features in time sequences may be "sometimes informative", that is, discriminative only in some segment of a sequence. SBHMMs is able to address the problems of both temporal correlation and segmentally informative features by assuming "piecewise i.i.d." Experiments show that the SBHMMs consistently reduces the error of traditional HMM recognition algorithms.
Animated Speakers project is aimed at analysis and synthesis of facial motions associated with Speech.
Human Identification at a Distance , A project aimed at recognizing people at a distance..
M odel-based Head Tracking : A Robust method for tracking heads from video is developed. In this method the head is tracked by finding the six translation and rotation parameters to register a rendered image of the textured model with the head in the video image.
Appearance-based Tracking of Head Yaw: A real-time system for tracking head yaw (rotation with respect the vertical axis) by interpolation of template responses.
Pupil Detection and Tracking : Reliable pupil detection and tracking are integral towards more attentive user interfaces. The physiological properties, dynamics, and appearance of pupils are used to robustly find and track them. Once pupils are found, they can be used to extract higher level information, such as faces, from the scene. We are currently looking at what other higher-level information can be determined about users using this technique as a foundation.
Perceptual User Interfaces using Vision-based Eye Tracking : Head pose and eye gaze information are very valuable cues in face-to-face interactions between people. We propose enhancements to various vision-based eye-tracking approaches, which include (a) the use of multiple cameras to estimate head pose and increase coverage of the sensors and (b) the use of probabilistic measures incorporating Fisher's linear discriminant to robustly track the eyes under varying lighting conditions in real-time.
Tracking Multiple Objects through Occlusions: Tracking varying number of objects through both temporally and spatially significant occlusions. Our method builds on the idea of object permanence to reason about occlusion. To this end, tracking is performed at both the region level and the object level. At the region level, a customized Genetic Algorithm is used to search for optimal region tracks. This limits the scope of object trajectories. At the object level, each object is located based on adaptive appearance models, spatial distributions and inter-occlusion relationships. The proposed architecture is capable of tracking objects even in the presence of long periods of full occlusions.
A Modular Approach to the Analysis and Evaluation of Particle Filters for Figure Tracking We present the first systematic empirical study of the particle filter (PF) algorithms for human figure tracking in video. Our analysis and evaluation follows a modular approach which is based upon the underlying statistical principles and computational concerns that govern the performance of PF algorithms. Based on our analysis, we propose a novel PF algorithm for figure tracking with superior performance called the Optimized Unscented PF. We examine the role of edge and template features, introduce computationally-equivalent sample sets, and describe a method for the automatic acquisition of reference data using standard motion capture hardware. The software and test data are publicly-available on our project website.

Nonphotorealism

Artstyling of Images and Videos: Converting Videos and Images to have a NPR look with brush strokes.

Multimedia Editing and Synthesis

Content Based Image Synthesis: An application for editing images, which synthesizes new regions of the image based on a database of annotated image regions. High and low level annotations in the database guide the user to regions which would fit in the new image.
Photo Collage Authoring: An application and novel interface for structuring and authoring photo collages based on their associated metadata.

Video-based Scene Reconstruction

Scene Reconstruction from E xtended Video Sequences : We are developing efficient methods for Model/Scene Reconstruction from Extended (long) Video Sequences.

Image/Video-based Modeling, Rendering , Animation

Graphcut Textures: A patch-based method for image and video texture synthesis. An algorithm for finding optimal seams between input and output texture (for an automatically computed texture placement/offset) is developed. An interactive application for image compositing is also demonstrated.

Image-based Method for G enerating Motion Blur . An approach that allows for generating motion blur as a video-based post-process. Applications shown for blurring stop-motion footage and for real video.

Learning video processing by example : We approximate the output of an arbitrary video processing algorithm based on a pair of input and output exemplars. Our algorithm relies on learning the mapping between the input and output exemplars to model the processing that has taken place.
Rendering S kin : Fine scale skin structure has traditionally been ignored when rendering human skin. This technique adds the detail that is missing, and increases its photo-realism significantly.

Video Textures : Develop a new pictorial medium that is in-between photographs (static) and video (with a well-defined begin and an end). Video textures allows generation of infinitely long videos from a finite set of frames. This technique can be used for video-based rendering and video-based animation.

Texture Optimization: A global optimization-based technique for texture synthesis is developed which combines local neighborhood-based similarity measures into a global metric. Controllable synthesis is also demonstrating by animating textures guided by flow fields.

Video-based Segmentation

L ayering: Motion Based Decompositing of Video. Motion reveals which objects are in front and which are in back.
Occlusion O rdered Layering

Audio-Visual Speech Recognition

Speech Reading: Using visual clues to assist acoustic speech recognition.
Segmentally Boosted HMM (SBHMM) is a discriminative feature selection method for time sequence classification, which predicts a single label for an entire sequence, including sign language recognition, gait recognition, lip reading and speech recognition. Traditional feature selection methods require the data being independently and identically distributed (i.i.d.), but the frames in time sequences are usually correlated. Furthermore, features in time sequences may be "sometimes informative", that is, discriminative only in some segment of a sequence. SBHMMs is able to address the problems of both temporal correlation and segmentally informative features by assuming "piecewise i.i.d." Experiments show that the SBHMMs consistently reduces the error of traditional HMM recognition algorithms.

Complete List of Current / Ongoing

A nimated Speakers project is aimed at analysis and synthesis of facial motions associated with Speech. A utomated Understanding of Captured E xperiences project is aimed at reducing substantially the human input necessary for creating and accessing large collections of multimedia, particularly multimedia created by capturing what is happening in an environment. The Aware H ome project is a an attempt at a building a large scale living laboratory that is aware of its inhabitants and their activities. Digital V ideo Special Effects (DVFX): Combining Video, Graphics, and Computer Vision to generate Digital FX. EC lass (formerly Classroom 2000): Capture your whole classroom experience, a room that takes notes. H uman Identification at a Distance , A project aimed at recognizing people at a distance.. I mage-based Method for G enerating Motion Blur . An approach that allows for generating motion blur as a video-based post-process. Applications shown for blurring stop-motion footage and for real video. i. a. e. I ntelligent and Aware Environments (aka. S mart Spaces) are spaces that have been transformed into smart work areas where color CCD cameras, big screen displays, microphones, and other sensors are fused with computers. Real-time analysis and tracking of lab inhabitants takes place. An intelligent agent interfaces with lab dwellers and other devices in the room. ( Older Link ) L ayering: Motion Based Decompositing of Video. Motion reveals which objects are in front and which are in back. Learning video processing by example : We approximate the output of an arbitrary video processing algorithm based on a pair of input and output exemplars. Our algorithm relies on learning the mapping between the input and output exemplars to model the processing that has taken place. M odel-based Head Tracking : A Robust method for tracking heads from video is developed. In this method the head is tracked by finding the six translation and rotation parameters to register a rendered image of the textured model with the head in the video image. M ulti-dimensional Texture Synthesis using G raph Cuts : A patch-based method for texture synthesis in multiple dimensions. An algorithm for finding optimal seams between input and output texture (for a given texture placement/offset) is developed. Our technique allows iterative refinement of already generated textures.
O bjectSpaces is a vision-based methodology for detecting and recognizing physical interactions between a person and objects in the surroundings. Human action recognition as well as object classification is performed using an object-oriented framework. The goal is to make computers aware of people and their activities. This research has applications in automatic video annotation and surveillance as well as in embedded environments.
Pupil D etection and Tracking : Reliable pupil detection and tracking are integral towards more attentive user interfaces. The physiological properties, dynamics, and appearance of pupils are used to robustly find and track them. Once pupils are found, they can be used to extract higher level information, such as faces, from the scene. We are currently looking at what other higher-level information can be determined about users using this technique as a foundation. Perceptual U ser Interfaces using Vision-based Eye Tracking : Head pose and eye gaze information are very valuable cues in face-to-face interactions between people. We propose enhancements to various vision-based eye-tracking approaches, which include (a) the use of multiple cameras to estimate head pose and increase coverage of the sensors and (b) the use of probabilistic measures incorporating Fisher's linear discriminant to robustly track the eyes under varying lighting conditions in real-time. Rendering S kin : Fine scale skin structure has traditionally been ignored when rendering human skin. This technique adds the detail that is missing, and increases its photo-realism significantly. S cene Reconstruction from E xtended Video Sequences : We are developing efficient methods for Model/Scene Reconstruction from Extended (long) Video Sequences. Speech Reading: Using visual clues to assist acoustic speech recognition. U biquitous Video and Audio : The aim of this project is to explore and prototype a spectrum of applications for ubiquitous video and audio processing in our daily environments. We are developing methodologies to instrument spaces with video and audio sensors and studying modeling techniques for interpretation and analysis of signals that are captured by these sensors.
V ideoTextures : Develop a new pictorial medium that is in-between photographs (static) and video (with a well-defined begin and an end). Videotextures allows generation of infinitely long videos from a finite set of frames. This tecnhique can be used for video-based rendering and video-based animation.

Past Projects.

B allet In A Box is HCI research in the domain of ballet. While good dance instructors are masters in their own right, we are collaborating with the Psychology department to build and evaluate a computer system that could be used to teach dance. The system will provide users with views of each dance motion from various angles and at different speeds, and is being tested against video-instruction tapes and books. DFacs : A research project aimed at the recognition, modeling, and realistic animation of facial expressions. Various methods for recognition of facial expressions have been developed. At present our interest in creating a database for testing our system and towards realistic animation of facial expressions. Euripides' T he Bacchae is a PTRL production of a Greek tragedy performed at the Dramatech theater in March, 1998. As our first step toward applying our research in the world of entertainment f/x, we provided parts of a computer vision system, in the form of video-augmenting software, to represent the world as seen from the eyes of a blind soothsayer. PePe (PErsonal PEt) project is a long-term research project to build intelligent, adaptive, user-friendly agent that displays different emotional states and awide range of behaviors. More specifically, we are trying to build an agent that exhibits pet-like behaviors and emotions. Our goal is to make the interaction between the user and the agent as natural as possible thus making the user perceive the agent more as a friend or a pet than as a toy.
Smart F loor We are creating a system that identifies and tracks people based on their footfalls. We are instrumenting a space in order to track and identify multiple people across a large indoor area. We are also integrating video identification and tracking technologies into the project. Applications we are exploring include home activities, art and dance performance, and entertainment.