CS 7636
Computational Perception
Spring 2003
Howey-Physics S107
MWF 11:00 - 12:00 noon
Problem Sets Syllabus
Projects
This course will cover statistical and algorithmic methods for sensing people
using cameras and microphones. We will develop video and audio models and
explore their application to complex sensing tasks. The bulk of the syllabus is
devoted to vision-based human sensing, a branch of computer vision concerned
with "looking at people". We will also cover topics in speech
recognition and multi-modal sensing. We will emphasize unifying statistical
models and techniques, primarily graphical models such as Bayesian networks.
Instructor
Jim Rehg
Email: rehg@cc.gatech.edu
Office: CoC Bldg (CCB) 253
Office hours: 11-12 Tues and Thurs
Phone: 404-894-9105 (email preferred)
Prerequisites
Some previous experience or coursework in computer vision, image processing,
or computer graphics (such as CS 4495/7495) and pattern recognition or machine
learning (such as CS 4803 or 4640). Familiarity with Matlab and basic linear
algebra and statistics. Permission of the instructor.
Organization
Grades will be assessed as follows:
| Problem Sets |
55% |
| Final Project |
40% |
| Participation |
5% |
Problem sets will be largely based on Matlab and will be distributed weekly
or bi-weekly. Collaboration on
problem sets is encouraged at the "white board interaction" level.
That is, share ideas and technical conversation, but write your own code. A few
problem sets may require you to work in teams of 2-3. I plan to grade and return
problem sets promptly. As a result, I will require all problem sets to be turned
in on time.
No late submissions will be accepted without prior permission of the
instructor.
Undergrads and grads will be graded on separate curves; more is expected from
a graduate project than an undergraduate project.
Text
There is no required text. The following supplemental texts may be helpful:
- Dynamic Vision: From Images to Face Recognition by S. Gong, S.
McKenna, and A. Psarrou, Imperial College Press, 2000. [Amazon]
This has some useful material on face perception. The table of contents can
be viewed on-line at Amazon.
- Introduction to Graphical Models by M. Jordan, 2003.
[Available as class handouts]
This text provides a excellent introduction to graphical models, which provide a
unifying statistical framework for the course material. Also see Kevin
Murphy's excellent on-line
tutorial.
- Learning in Graphical Models by M. Jordan (editor), MIT Press,
1999.
An excellent collection of papers on graphical models, including topics such
as variational inference.
- Sequential Monte Carlo Methods in Practice by A. Doucet, N. Freitas,
and N. Gordon (editors). Springer-Verlag 2001. [Amazon].
This collection of papers has good coverage of sampling methods for dynamic
systems (particle filters).
- Statistical Methods for Speech Recognition by F. Jelinek. MIT
Press, 1998. [Amazon]
Fundamentals of Speech Recognition by L. Rabiner and B.-H. Juang.
Prentice-Hall, 1993. [Amazon]
These are standard speech recognition texts.
Background texts:
- Computer Vision: A Modern Approach by D. Forsyth and J. Ponce.
Prentice-Hall 2002.
A recent general computer vision text.
- Neural Networks for Pattern Recognition by C. Bishop. Oxford
University Press, 1995. [Amazon]
This is a classic text for density estimation and pattern recognition.
- Pattern Classification by R. Duda, P. Hart, and D. Stork. Wiley-Interscience,
2000. [Amazon]
Another classic, recently brought up-to-date.
- Information Theory, Inference, and Learning Algorithms by D.
MacKay. [online]
Excellent treatment of information theoretic methods in learning.
- Linear Algebra and Its Applications or Introduction to Linear
Algebra by G. Strang. [Amazon]
- Matrix Reference Manual [online]
- Introduction to Probability by D. P. Bertsekas and J. N. Tsitsiklis.
[online]
- Probability, Random Variables, and Stochastic Processes by A.
Papoulis.
Classic text for probability theory and its application. Also see on-line lecture
notes for EE178 at Stanford.
- Probability Theory: The Logic of Science by E. T. Jaynes. [online]
Classic text on probability theory, chapters 1 and 2 in particular are good
background reading.
- Applied Optimal Estimation by A. Gelb (editor). MIT Press, 1974. [Amazon]
Classic text for Kalman filter and linear estimation.
Help with Matlab
Related courses:
Problem Sets
PS 1 [ps] [pdf]: Out Jan 14; Due Jan 20 (Review of background
material)
Syllabus
- Skin color modeling and detection [1/6-17/03]
- Resources: IJCV
paper
- Tools: graphical model, mixture density, likelihood ratio test, ROC curve
- Histogram color model as discrete variable graphical model [1/6/03]
- Separability of skin and nonskin distributions in web images [1/8/03]
- Adult image detection [1/10/03]
- Mixture densities [1/15/03] (missed lecture 1/13/03)
- Generalization and performance analysis [1/17/03]
- Modeling facial appearance
- Face recognition, tracking, and synthesis
- Speech recognition
- Resources: Tony Robinson's Speech
Analysis page.
- Tools: Hidden Markov Model, Expectation-Maximization
- HMM introduction [2/13,15/02]
- Viterbi algorithm [2/18/02]
- EM algorithm [2/20-22/02]
- Speech recognition architecture [2/25/02]
- Language modeling [2/27/02]
- Acoustic modeling [3/01/02]
- Viterbi, beam, and tree search [3/11-13/02]
- Facial Expressions and Gesture Recognition
- Resources: EM (R.
Neal, T.
Minka); Thad ASL; Revert
Speech
- Tools: HMM, EM
- EM and mixture density learning [3/15/02]
- Analysis and synthesis of facial expressions and speech reading.
[3/18/02]
- HMM-based recognition of ASL [3/20/02]
- EM as lower-bound maximization [3/22/02]
- Modeling Human Motion
- Resources: M. Brand (HMM,
Puppetry,
Style)
- Tools: ARMA models
- Introduction to stochastic processes and linear Gaussian models
[3/25/02]
- ARMA models [3/2702]
- Yule-Walker equations [3/29/02]
- Class presentations of final project proposals [4/1,3/02]
- Head and hand tracking
- Tools: State space models, Kalman filter
- Introduction to state space modeling [4/5/02]
- 3-D model-based tracking
- Tracking with morphable models
- Contour-based hand tracking
- Figure tracking
- Tools: Gaussian sum filter, particle filter
- Kinematic modeling
- Appearance-based tracking
- Visual singularities
- Monocular reconstruction
- Action recognition
- Tools: Stochastic CFG, Bayesian networks
- Connections between actions and language
- Modeling paradigms
- Action recognition
- Multi-modal sensing
- Tools: Bayesian networks, "Flexible" models, maximum entropy
- Speaker detection
- Speaker tracking
- Speech reading
Final Projects
Project Ideas
- Skin color
- Using a calibrated skin color dataset which includes a reference color
chart for each example image to conduct a study of color-based skin
detection. In particular, can compensating for the illuminant lead to
better detection rates?
- Reproducing Michael Tarr's recent work on gender classification using
red/green skin color ratios.
- Face analysis
- Learning a distance measure for "Separated at Birth" (TM).
Given a set of facial image pairings which combine facial similarity
with some text attributes, is it possible to learn a distance measure
which would produce similar pairings on a test dataset?
- Face detection or recognition using boosted decision trees.
- Human motion
- Gesture recognition (for example American Sign Language) using HMM's.
- Synthesizing human motion by learning dynamic models from motion
capture data.