CS7322: High Level Computer Vision: Project Proposal

Title: Lipreading
Group Member:

In this project we want to analyze the lip motion with its associated spoken words, to develop a lipreading system which uses visual features to recognize speech.

The questions we have and want to solve are, which visual features of the lips carry important speech information, and what is a suitable representation that could more acurately capture the characteristics of the lip motion for the speech recognition. We have several ideas of different approachs:

Speech recognition is a complex problem. Since we couldn't solve the whole problem in this project, we want to get some basic ideas (and apply some of the methods described in paper) behind extracting visual information that will aid in speech recognition when using it in combination with audio input.

To make our problem less complicated, we decided that the goal for this project is for the program to recognize a finite set of vowels. This is mainly because each vowel pronounced produces a fairly distinct mouth shape. We will also put some restrictions to the data that we will be using as well.




Project stages:

1. Get Data
We have to first determine how much data we need for this project, and also, perhaps only limite it to the mouth region to simplify the process of identifying the mouth from the face. Since a speech recognition system is usually applied to an unknown speaker, we are aiming at a speaker independent system. For this reason we will try to convince at least 10 person and collect the data from them.

To simplify the processing of the data, we've decided that we will have a fixed begin and a fixed ending to the motion sequence for all the mouth shapes.


2. Data Processing
Currently we will be using an optic flow method, as described by the paper "Recognition of Facial Expression from Optical Flow" by Kenji Mase (IEEE paper: Special Issue on Computer Vision and Its Applications - 10 October 1991). That is, calculating the direction of muscle actions around the lip region by measuring the optic flow data. Then computing the optical flow from each of the four windows around the mouth and examine the velocity pattern to define feature vector for each different vowel.

3. Take unknown data and try recognition.
We will then define a similarity function that will help us in recognizing unknown data. (calculate the feature vector for the unknow data and do some matching.)

4. Performance measurement.
In order to test if our approach works, we definitely need to obtain more unknown data. But, this is the very last part of the project, and perhaps it might be skipped.




Division of Task


Time Line (Tentative)

Week 1 Week 2-3 (With progress report) Week 4-6