Classifying Gestures using Multiple Modalities


Sponsor Sunil Mishra <smishra@cc.gatech.edu
Area Intelligent Systems

Problem

The process of human communication is quite complex. When talking, we try to efficiently utilize a low bandwidth medium (speech) to convey information effectively. When we have a conversation with an individual physically present, speech is not the only means used. The entire body is used. In particular, the head, face, hands and fingers get involved in various types of gestures.

Various researchers (see below) have suggested that gestures and speech are linked. That is, the type and occurrance of gestures is tied to the speech we produce. We have narrowed our focus to monologues as a special case of conversations. This steers us clear of many other types of issues that conversations bring, such as rules of discourse and disambiguating the speech of multiple speakers.

We have outlined a methodology to study monologues as communication involving machine learning techniques. There are many unresolved issues remaining, including selecting features that may be used for learning.

Background Materials

Come see me if you have trouble locating any of these papers.

Task Description

We have available large quantities of raw data, from which we programatically extract features for learning. You will investigate the utility of two features -- the symmetry in the hand positions of the lecturer, and an estimate of the direction in which the lecturer faces. You will:

  1. Propose how these features ought to be calculated given our raw data.
  2. Evaluate the utility of these features using a variety of techniques, including the observed accuracy in classifying various events.
  3. Write up a short report detailing the feature calculation and the observed accuracy.