Facial Animation: Using Speech Recognition for Lip Synch
Problem
In facial animation, Lip Synch is known as the process of synchronizing
a pre-recorded sound track with the lip and facial movements of a synthetic
character. To realize a successful animation, it is required to perform
an accurate timing alignment between the acoustic signal and the visual
rendering. The animation film "Alien
Song" by Victor Navone shows a good example of lip synch. The traditional
approach in computer graphics consists in labeling the sound track with
visual targets, these targets corresponding to the pronounced phonemes
(sound units in speech). This time-consuming step can be helped with speech
recognition software but the available systems for lip synch making use
of speech recognition still require some "art work" to correct the automatic
labeling. The goal of this project aims at improving this issue by using
powerful speech recognition software.
A facial animation system is available at CPL (see L. Reveret) : it
allows to generate a visual animation of a talking head model from a text
input. The work of this project will be to align a text with any recorded
sound track of a speaker, using the speech recognition softwares available
at the CPL (ViaVoice, HTK, ...). The result of this project will allow
any speaker to make a visual animation from his voice.
Here is what you need to do (and will learn)
-
Read the following paper, which gives an excellent review of methods used
in speech recognition.
-
Reddy, R., "Speech Recognition by Machine: A Review", In IEEE Proceedings
6(4), 502--531, 1976.
-
Rabiner, L., "A Tutorial on Hidden Markov Models and Selected Applications
in Speech Recognition", In Proceedings of the IEEE, 77(2), 257--286, 1989.
-
Read the following paper for a description of the facial animation system
available :
-
Reveret, L., Bailly G., Badin, P., "MOTHER: A new generation of talking
heads providing a flexible articulatory control for video-realistic speech
animation", Proc. of the 6th Int. Conference of Spoken Language Processing,
ICSLP'2000, Beijing, China, Oct. 16-20, 2000.
-
Study the HTK, Waves and Transcriber system by Entropics Inc. which are
toolkits for speech recognition available in the Computational Perception
Lab (One of grad students will show you how) and how can they be combined
with other commercial softwares.
-
Develop a demo of speech recognition for phonemes recognition. This demo
will be connected with the facial animation system available at the CPL.
-
Evaluate the results and discuss the differences, strengths, weaknesses
of the approach.
Background
Deliverables
-
Write a 4-5 page Report addressing the following issues:
-
Give a brief overview of what you understand is the speech recognition
problem and its specific application to Lip Synch. Is it tractable ?
-
Explain the Hidden Markov Models (HMM).
-
Report your progress with the HTK package and the recognition demo for
Lip Synch.
Evaluation
Based on the report turned in to the sponsor of the project by the
due date.