Learning variations of a single
predefined-activity
Amos Johnson, Aaron Bobick, Muhammad Raffay Hamid, Samir Batta
|
In general automatic
activity-recognition maybe defined as a computer-vision system that
recognize what activities are occurring in a video-monitored environment,
from a predefined set of activities to look for. It
is a cutting-edge research-field that will potentially allow for lots of
interesting applications, such as, automatic video-monitoring, robust
surveillance, and much more.
However instead of detecting what activities are occurring in an environment, we are interested in creating a system that will learn the various ways a single predefined-activity may occur from a limited amount of visual data of the activity.
Using this information, the system will classify a new instance
of the activity as either belonging to one of the many variations of the
activity or as an abnormally. Currently we are interested in developing a
system to learn the variations in a loading dock
activity, and to detect
when an abnormally occurs. |
|
Toward A General Framework for
Activity Recognition and Anomaly Detection
Yan Huang, Muhammad Raffay Hamid, Irfan Essa
| The system provides robust tracking of objects in cluttered environments under varying
illumination conditions and short-term full occlusion. For robust tracking, we rely on operations that are performed on
likely regions of significant activity within the scene. Statistical color and shape features are extracted for tracking within these
regions. In addition, we employ optimized maximum likelihood estimation (OMLE), as well as spatio-temporal trajectory coherence
within a particle filter framework. |
Writing in English (Video) |
|
Move in Subspace (Video)
|
Bag Exchanging (Video)
|
Expectation Grammars
Leveraging High-Level Expectations for Activity Recognition
David Minnen, Irfan Essa, Thad Starner
| Video-based recognition and prediction of a temporally extended activity can benefit from a detailed description of high-level expectations about the activity. Stochastic grammars allow for an efficient representation of such expectations and are well-suited for the specification of temporally well-ordered activities. In this paper, we extend stochastic grammars by adding event parameters, state checks, and sensitivity to an internal scene model. We present an implemented system that uses human-specified grammars to recognize a person performing the Towers of Hanoi task from a video sequence by analyzing object interaction events. Experimental results from several videos show robust recognition of the full task and its constituent sub-tasks even though no appearance models of the objects in the video are provided. These experiments include videos of the task performed with different shaped objects and with distracting and extraneous interactions. |
Videos |
| Electronic copy of the paper: [PDF
version] [Postscript
version] |
Automated Tracking and Modeling of Social Animal Behavior
Frank Dellaert and Tucker Balch
|
A number of leading robotics researchers (including for instance, Brooks, Beer and Arkin), tell us their work was inspired in part by the behavior of insects. Recently, ants have captured the imagination of computer science and network systems researchers as an inspiration for their optimizing algorithms (Bonabeau). We are particularly interested in learning about social insects, as they provide an existence proof of successful large-scale robust behavior forged from the interaction of many, simple agents. Ant behavior can offer a wealth of ideas on how to organize a cooperating colony of agents. As an example, even though they are only capable of very short range communication, ants are able to carry out complex scouting and retrieval operations over tens of meters. The techniques social insects utilize for staging such complex operations could also be employed in the design of robust multi-robot systems --- it is important for us to learn what insects have to offer. |
|
Representing and Recognizing
Activities Based upon Temporally Ordered Interval
Yifan Shi and Aaron Bobick
We present a method of representing and recognizing activities based upon temporally ordered intervals. Each interval has both temporal constraints, i.e., before/ after/duration, and logical relationships. Recognizing such activity requires the processing of multiple, parallel streams. Accordingly, we devise a Propagation Net (P-Net) as a new mechanism for the representation and recognition of multi-stream action. P-Nets associate a node for each interval. Each node is probabilistically triggered according to a probability density function that depends upon the state of its precursor nodes. Each node also has an associated observation distribution function that describes positive perceptual evidence. By their nature, P-Nets describe an exponential state space with limited branching factors. To facilitate real-time video analysis, we use a particle filter to explore the conditional state space. We modify the original Condensation algorithm to more efficiently sample a discrete state space (DCondensation). Experiments on video and motion captured data demonstrate both the capacity of P-Net representation and the effectiveness of the D-Condensation algorithm.