Facial Modeling/Animation: Predicting visual information from acoustic for Lip Synch


Sponsor Irfan Essa
206 CCB
http://www.cc.gatech.edu/~irfan
Area GVU / Intelligent Systems (and Computational Perception Lab / FCE)

Problem
In facial animation, Lip Synch is known as the process of synchronizing a pre-recorded sound track with the lip and facial movements of a synthetic character. To realize a successful animation, it is required to perform an accurate timing alignment between the acoustic signal and the visual rendering. The animation film "Alien Song" by Victor Navone shows a good example of lip synch. The traditional approach in computer graphics consists in labeling the sound track with visual targets, these targets corresponding to the pronounced phonemes (sound units in speech). This time-consuming step can be helped with speech recognition  but the available systems for lip synch making use of speech recognition still require some "art work" to correct the automatic labeling. An alternative approach consists in trying to estimate the facial geometry directly from the acoustic signal. The goal of this project is to investigate an example of this approach.

For this project, the visual rendering will be supported by the facial animation system available at the CPL (ask Lionel Reveret). This system uses a characterization of facial movements in speech production, based on a set of visual speech parameters (see below article of Reveret et al.). The work of this project will be to implement a prediction of these visual speech parameters from an acoustic signal (multilinear regression, neural networks, HMM,...)
 

Here is what you will need to do.

Background Deliverables Evaluation
Based on the report turned in to the sponsor of the project by the due date.