In this project we want to analyze the lip motion with its associated spoken words, to develop a lipreading system which uses visual features to recognize speech.
The questions we have and want to solve are, which visual features of the lips carry important speech information, and what is a suitable representation that could more acurately capture the characteristics of the lip motion for the speech recognition. We have several ideas of different approachs:
Speech recognition is a complex problem. Since we couldn't solve the whole problem in this project, we want to get some basic ideas (and apply some of the methods described in paper) behind extracting visual information that will aid in speech recognition when using it in combination with audio input.
To make our problem less complicated, we decided that the goal for this project is for the program to recognize a finite set of vowels. This is mainly because each vowel pronounced produces a fairly distinct mouth shape. We will also put some restrictions to the data that we will be using as well.
To simplify the processing of the data, we've decided that we will have a fixed begin and a fixed ending to the motion sequence for all the mouth shapes.
2. Data Processing
Currently we will be using an optic flow method, as described by the paper
"Recognition of Facial Expression from Optical Flow" by Kenji Mase (IEEE
paper: Special Issue on Computer Vision and Its Applications - 10 October 1991).
That is, calculating the direction of muscle actions around the lip region
by measuring the optic flow data. Then computing the optical flow from each
of the four windows around the mouth and examine the velocity pattern to
define feature vector for each different vowel.
3. Take unknown data and try recognition.
We will then define a similarity function that will help us in recognizing
unknown data. (calculate the feature vector for the unknow data and do some
matching.)
4. Performance measurement.
In order to test if our approach works, we definitely need to obtain more
unknown data. But, this is the very last part of the project, and
perhaps it might be skipped.