Learning variations of a single predefined-activity  

Amos Johnson, Aaron Bobick,  Muhammad Raffay Hamid, Samir Batta 

In general automatic activity-recognition maybe defined as a computer-vision system that recognize what activities are occurring in a video-monitored environment, from a predefined set of activities to look for.  It is a cutting-edge research-field that will potentially allow for lots of interesting applications, such as, automatic video-monitoring, robust surveillance, and much more.

 

However instead of detecting what activities are occurring in an environment, we are interested in creating a system that will learn the various ways a single predefined-activity may occur from a limited amount of visual data of the activity. 

 

Using this information, the system will classify a new instance of the activity as either belonging to one of the many variations of the activity or as an abnormally. Currently we are interested in developing a system to learn the variations in a loading dock activity, and to detect when an abnormally occurs.

 


 

Toward A General Framework for 

Activity Recognition and Anomaly Detection

Yan Huang, Muhammad Raffay Hamid, Irfan Essa  

The system provides robust tracking of objects in cluttered environments under varying illumination conditions and short-term full occlusion. For robust tracking, we rely on operations that are performed on likely regions of significant activity within the scene. Statistical color and shape features are extracted for tracking within these regions. In addition, we employ optimized maximum likelihood estimation (OMLE), as well as spatio-temporal trajectory coherence within a particle filter framework.

Writing in English (Video)

Writing in English

Move in Subspace (Video)

Office_1

 

Bag Exchanging (Video)

Bag Exchanging

 

 

 


Expectation Grammars
Leveraging High-Level Expectations for Activity Recognition

 David Minnen, Irfan Essa, Thad Starner

Video-based recognition and prediction of a temporally extended activity can benefit from a detailed description of high-level expectations about the activity. Stochastic grammars allow for an efficient representation of such expectations and are well-suited for the specification of temporally well-ordered activities. In this paper, we extend stochastic grammars by adding event parameters, state checks, and sensitivity to an internal scene model. We present an implemented system that uses human-specified grammars to recognize a person performing the Towers of Hanoi task from a video sequence by analyzing object interaction events. Experimental results from several videos show robust recognition of the full task and its constituent sub-tasks even though no appearance models of the objects in the video are provided. These experiments include videos of the task performed with different shaped objects and with distracting and extraneous interactions.

Videos

Electronic copy of the paper: [PDF version] [Postscript version]

 Automated Tracking and Modeling of Social Animal Behavior

Frank Dellaert and Tucker Balch

A number of leading robotics researchers (including for instance, Brooks, Beer and Arkin), tell us their work was inspired in part by the behavior of insects. Recently, ants have captured the imagination of computer science and network systems researchers as an inspiration for their optimizing algorithms (Bonabeau).

We are particularly interested in learning about social insects, as they provide an existence proof of successful large-scale robust behavior forged from the interaction of many, simple agents.

Ant behavior can offer a wealth of ideas on how to organize a cooperating colony of agents. As an example, even though they are only capable of very short range communication, ants are able to carry out complex scouting and retrieval operations over tens of meters. The techniques social insects utilize for staging such complex operations could also be employed in the design of robust multi-robot systems --- it is important for us to learn what insects have to offer.

More Information


Representing and Recognizing
Activities Based upon Temporally Ordered Interval

Yifan Shi and Aaron Bobick 

 

We present a method of representing and recognizing activities based upon temporally ordered intervals. Each interval has both temporal constraints, i.e., before/ after/duration, and logical relationships. Recognizing such activity requires the processing of multiple, parallel streams. Accordingly, we devise a Propagation Net (P-Net) as a new mechanism for the representation and recognition of multi-stream action. P-Nets associate a node for each interval. Each node is probabilistically triggered according to a probability density function that depends upon the state of its precursor nodes. Each node also has an associated observation distribution function that describes positive perceptual evidence. By their nature, P-Nets describe an exponential state space with limited branching factors. To facilitate real-time video analysis, we use a particle filter to explore the conditional state space. We modify the original Condensation algorithm to more efficiently sample a discrete state space (DCondensation). Experiments on video and motion captured data demonstrate both the capacity of P-Net representation and the effectiveness of the D-Condensation algorithm.