Presented by: Sunil Mishra
I shall present my work on speech and gesture analysis in the context of a monologue. It is as yet unclear how this fits into a larger software agents context, though I am certain there is a strong relationship. I hope to focus the discussion toward this issue.
I have been working on establishing and describing the correlations between auditory and gestural data in monologues. Obviously, there are many difficult low level issues involved in the capture (low level perceptual) processes. These though are not my focus. Rather, the question I want to ask is, assuming that we can get good processed auditory and visual data for a speaker, what kind of patterns, if any, can we recognize in the monologues?
In our studies we have used unsupervised learning for discovering regularities in the data we have gathered. The data has so far come from two very distinct sources: a short classroom lecture, and from standup routines of professional entertainers, viz. Jay Leno and David Letterman. Our results appear to provide good evidence for our hypothesis that such a correlation exists. The clusters we get for any given speaker in the particular situation tend to be quite stable. We are able to meaningfully apply the clusters we have obtained from one data set to a new data set. These results though are quite preliminary, and require further study.
This is largely an exploratory study in two ways. I do not yet know what techniques to apply for exploring the structure of the discourse. Nor do I have a good feel for the types of applications where such an approach might work best. I hope to get feedback on both of these issues from those that attend. I believe this type of approach toward data can be fruitful in a variety of agent contexts, but have no evidence to back up this belief.
A good description of the approach we had started with can be found in
Michael A. Casey and Joshua S. Wachman (1996) "Unsupervised Cross-Modal Analysis of Professional Monologue Discourse" Appears in: Workshop on the Integration of Gesture in Language and Speech (WIGLS) 96
Last modified: Thu Oct 15 17:45:58 EDT 1998