Abstracts

[Home] [Schedule] [Speakers & Topics] [Goals and Format] [Abstracts] [NIPS 2004 Conference] [NIPS 2004 Workshops]

Talks

Irfan Essa and Aaron Bobick, GVU Center, GA Tech

"Activity Recognition: From HMMs to Grammars to Network Representations."

I will present a brief overview of various methods that have been studied for visual recognition. These methods vary from use of action and object context to recognize activities, to the use of simple and extended grammar representations, to the use of various network representations. The goal of this presentation will be more to highlight the open problems and the need for newer representations.

Jianbo Shi, Department of Computer and Information Science University of Pennsylvania

"Finding Unusual Activity in Video"

Imagine you are given a long video, possibly thousands of hours long, and you are asked to analyze the video to detect unusual events. By definition, unusual events are rare, difficult to describe, and impossible to predict. Any system that detects unusual events must sift through extremely large amount of statistical details to detect a few relevant bits.

We present an unsupervised technique for detecting unusual activity in a large video set using many simple features. No complex activity models and no supervised feature selections are used. We divide the video into equal length segments and classify the extracted features into prototypes, from which a prototype-segment co-occurrence matrix is computed. Motivated by a similar problem in document-keyword analysis, we analyze the co-clustering between the documents(videos) and keywords(features). We define a simultaneous clustering and feature selection criterion using the transitive closure constraint. We show that an important sub-family of correspondence functions can be reduced to co-embedding prototypes and segments to N-D Euclidean space. We prove that an efficient, globally optimal algorithm exists for the co-embedding problem.

Experimentally, we have tested our algorithm on a variety of videos ranging from nursing home monitoring, poker game cheating, to roadway surveillance.

This is a joint work with Mirko Visontai at U.Penn., and Hua Zhong at CMU

Chris Wren, MERL

Event and Activity Discovery work at MERL

There is significant interest in activity discovery at MERL. This includes fundamental research projects as well as projects that are
tightly coupled with business units. The work ranges over many types of signals: visual, auditory, and impoverished sensors. there is also a wide range of problem domains: summarization of sporting events and newscasts, motion understanding, novelty detection for traffic monitoring, novelty detection for safety and security applications. Business units within our parent company have expressed significant interest in this technology, including units responsible for surveillance equipment, public works facilities construction, transportation infrastructure monitoring equipment, home entertainment equipment, and public safety systems. I will give an overview of this work with an emphasis on the connections and potential connections we see in industry.

Posters

"Activity recognition and abnormality detection with the Switching Hidden Semi Markov Model"

T. Duong, H. Bui, D. Phung and S. Venkatesh

This paper addresses the problem of learning and recognizing human activities of daily living (ADL), an important research issue in building pervasive and smart environment. In dealing with ADL, we argue that it is beneficial to exploit both the inherent hierarchical organization of the activities and their typical duration. To this end, we introduce the Switching Hidden Semi-Markov Model (S-HSMM), a two-layered extension of the Hidden Semi-Markov Model (HSMM) for the modeling task. Activities are modeled in the S-HSMM in two ways: the bottom layer represents atomic activities and their duration using HSMMs; the top layer represents a sequence of high-level activities where each high-level activity is made of a sequence of atomic activities. We consider two methods for modeling duration: the classic explicit duration model using multinomial distribution, and the novel use of the discrete Coxian distribution. In addition, we propose an effective scheme to detect abnormality without the need for training on abnormal data. Experimental results show that the S-HSMM performs better than existing models including the flat HSMM and the Hierarchical HMM in both classification and abnormality detection tasks, alleviating the need for pre-segmented training data. Furthermore, our discrete Coxian duration model yields better computation time and generalization error than the classic explicit duration model.

"Activity Recognition in the Driving Domain"

Kari Torkkola, Motorola, Intelligent Systems Lab, Tempe, AZ

Future intelligent systems in automobiles need to be aware of the driving and driver context. Available sensor data stream has to be modeled and monitored in order to do so. We are interested in developing intelligent driver assistance systems that, for example, manage the presentation of information to the driver from various devices or subsystems in the car, essentially managing the workload of the driver, or alert the driver when his or her attention is not where it should be. One necessary sub-component of such an intelligent assistance system is a driving situation detector that recognizes difficult driving situations requiring full attention of the driver, and then acts as a gate to information presentation from other devices to the driver. Another component could be a system detecting where the attention of the driver is directed, or what is happening in the cockpit.

In both cases the key is in detecting a particular driving or driver activity from the available sensor stream. We describe problems in this domain, available sensors, both in simulators and real automobiles, and some initial data-driven solutions to driver activity recognition based on machine learning techniques. The system has been partially implemented in a prototype system built upon a high-fidelity driving simulator, allowing us to run experimental tests on the interaction between the system and human users.

"Activity Mining for Sensor Networks"

Chris Wren, (MERL), David Minnen, (GA Tech).

We present results from the exploration of activity discovery based on impoverished sensors. Networks of low-cost, low-power, low-bandwidth sensors are a practical way of gathering context awareness in buildings. They are more widely applicable than dense networks of cameras because of their low component cost, low installation cost, and low privacy cost. However impoverished sensors pose a significant challenge for activity monitoring due their low capability. We build on our behavior understand work with impoverished sensors to show some results relating to behavior discovery and novel event detection.

D. C. Minnen, C. R. Wren, "Finding Temporal Patterns by Data Decomposition", IEEE International Conference on Automatic Face and Gesture Recognition (FG), May 2004.

"Discovery of Multiple Resolution Activity"

D. Ashbrook, T. Westeyn, D. Minnen, T. Starner (GA Tech)

We are interested in discovering activities from real-life data collected by human-worn sensors.

In earlier work, we discovered patterns of travel activity in GPS data collected over several weeks of users' daily lives. Our methods allowed the automatic discovery of physical locations that appeared to be significant to the users, the creation of a hierarchy of locations (such as a campus with sub-locations inside the campus) and a rudimentary predictive model of where the user might travel to next.

In our current work, we are investigating smaller-scale patterns using body-worn motion sensors. We have collected over 10 hours of data with people cooking a meal, eating it, and cleaning up afterwards. The activities were completely unscripted aside from having the same meal prepared each time. We used seven small, wired sensors from Xsens that give 100 Hz readings from a 3-axis accelerometer, a 3-axis magnetometer, a 3-axis gyroscope, a temperature sensor, and a 4-variable quaternion-based orientation reading. This gives us 14 readings 100 times a second for each sensor, or 588,000 sensor readings per minute.

Our intention is to develop techniques to automatically discover recurring patterns at various resolutions in this data. Our ultimate goal will be the assignment of meaningful labels to these patterns such as stirring sauce, going to the refrigerator, and "cooking", "eating" and "cleaning up".

Propagation Networks for Recognition of Partially Ordered Sequential Action

Yifan Shi, Yan Huang, David Minnen, Aaron Bobick, Irfan Essa, Georgia Institute of Technology

We present Propagation Networks (P-Nets), a novel approach for representing and recognizing sequential activities that include parallel streams of action. We represent each activity using partially ordered intervals. Each interval is restricted by both temporal and logical constraints, including information about its duration and its temporal relationship with other intervals. P-Nets associate one node with each temporal interval. Each node is triggered according to a probability density function that depends on the state of its parent nodes. Each node also has an associated observation function that characterizes supporting perceptual evidence. To facilitate realtime analysis, we introduce a particle filter framework to explore the conditional state space. We modify the original Condensation algorithm to more efficiently sample a discrete state space (D-Condensation). Experiments in the domain of blood glucose monitor calibration demonstrate both the representational power of P-Nets and the effectiveness of the D-Condensation algorithm.

Yifan Shi, Yan Huang, David Minnen, Aaron F. Bobick, Irfan A. Essa: Propagation Networks for Recognition of Partially Ordered Sequential Action. CVPR (2) 2004: 862-869

[Schedule] [Speakers & Topics] [Goals and Format] [Abstracts] [NIPS 2004 Conference] [NIPS 2004 Workshops]

Last Updated: 10/03/2005 10:04:58 AM