|
GVU
Technical Report Number: GIT-GVU-01-16
Title: Visual Coding and Tracking of Speech
Related Facial Motion
Authors: Lionel Reveret, Irfan Essa
Abstract:
This article present a visual characterization of facial motions
inherent with speaking. We propose a set of four Facial Speech Parameters
(FSP): jaw opening, lips rounding, lips closure, and lips raising,
to represent the primary visual gestures of speech articulation
into a multidimensional linear manifold. This manifold is initially
generated as a statistical model, obtained by analyzing accurate
3D data of a reference human subject. The FSP are then associated
to the linear modes of this statistical model, resulting in a 3D
parametric facial mesh. We have tested the speaker-independent hypothesis
of this manifold with a model-based video tracking task applied
on different subjects. Firstly, the parametric model is adapted
and aligned to a subject's face for a single shape. Then the face
motion is tracked by optimally aligning the incoming video frames
with the face model, textured with the first image, and deformed
by varying the FSP, head rotations, and translations. We show results
of the tracking for different subjects using our method. Finally,
we demonstrate the facial activity encoding into the four FSP values
to represent speaker-independent phonetic information.
Keywords: articulatory modelling, facial
animation, lip tracking, lip synching
You
can access this technical report via: PDF
, Postscript
|