Expectation Grammars
Leveraging High-Level Expectations for Activity Recognition
by David Minnen, Irfan Essa, and Thad Starner


Presented at CVPR 2003 in Madison, WI.

Electronic copy of the paper: [PDF version] [Postscript version]
Overview Slides: [PPT]
Posters: [CVPR: PPT, PDF] [GVU: PPT, PPT (no background)]

Abstract: Video-based recognition and prediction of a temporally extended activity can benefit from a detailed description of high-level expectations about the activity. Stochastic grammars allow for an efficient representation of such expectations and are well-suited for the specification of temporally well-ordered activities. In this paper, we extend stochastic grammars by adding event parameters, state checks, and sensitivity to an internal scene model. We present an implemented system that uses human-specified grammars to recognize a person performing the Towers of Hanoi task from a video sequence by analyzing object interaction events. Experimental results from several videos show robust recognition of the full task and its constituent sub-tasks even though no appearance models of the objects in the video are provided. These experiments include videos of the task performed with different shaped objects and with distracting and extraneous interactions.


Videos from the paper:

Experiment 1

Experiment 2

Experiment 3

Experiment 4

Poster Previews:

GVU Convocation

CVPR

Back to CPL Projects Page.