CS 7635
Computational Perception
Spring 2002
College of Computing 102
MWF 11:00 - 12:00 noon
Problem Sets Syllabus
Projects
This course will cover statistical and algorithmic methods for sensing people
using cameras and microphones. We will develop video and audio models and
explore their application to complex sensing tasks. The bulk of the syllabus is
devoted to vision-based human sensing, a branch of computer vision concerned
with "looking at people". We will also cover topics in speech
recognition and multi-modal sensing. We will emphasize unifying statistical
models and techniques.
Instructor
Jim Rehg
Email: rehg@cc.gatech.edu
Office: CoC Bldg (CCB) 253
Office hours: 11-12 Tues and Thurs
Phone: 404-894-9105 (email preferred)
Teaching Assistant
Hao Wang
Email: wanghao@cc.gatech.edu
Prerequisites
Some previous experience or coursework in computer vision, image processing,
or computer graphics. Familiarity with Matlab and basic linear algebra,
statistics, and pattern recognition techniques. Permission of the instructor.
Organization
Grades will be assessed as follows:
| Problem Sets |
40% |
| Midterm Exam |
15% |
| Final Project |
40% |
| Participation |
5% |
There will be approximately 5 problem sets based on Matlab. Collaboration on
problem sets is encouraged at the "white board interaction" level.
That is, share ideas and technical conversation, but write your own code. All
problem sets should be in on time. One late problem set is accepted late (but
before the next one is due) without excuse. After that, get prior permission.
There will be a take-home mid-term exam.
Undergrads and grads will be graded on separate curves; more is expected from
a graduate project than an undergraduate project.
Text
There is no required text. The following supplemental texts may be helpful:
- Dynamic Vision: From Images to Face Recognition by S. Gong, S.
McKenna, and A. Psarrou, Imperial College Press, 2000. [Amazon]
This has some useful material on face perception. The table of contents can
be viewed on-line at Amazon.
- Introduction to Graphical Models by M. Jordan and C. Bishop, 2002.
[Available as class handouts]
This text provides a excellent introduction to graphical models, which provide a
unifying statistical framework for the course material. Also see Kevin
Murphy's excellent on-line
tutorial.
- Learning in Graphical Models by M. Jordan (editor), MIT Press,
1999.
An excellent collection of papers on graphical models, including topics such
as variational inference.
- Sequential Monte Carlo Methods in Practice by A. Doucet, N. Freitas,
and N. Gordon (editors). Springer-Verlag 2001. [Amazon].
This collection of papers has good coverage of sampling methods for dynamic
systems (particle filters).
- Statistical Methods for Speech Recognition by F. Jelinek. MIT
Press, 1998. [Amazon]
Fundamentals of Speech Recognition by L. Rabiner and B.-H. Juang.
Prentice-Hall, 1993. [Amazon]
These are standard speech recognition texts.
Background texts:
- Computer Vision: A Modern Approach by D. Forsyth and J. Ponce. [online]
A recent general computer vision text.
- Neural Networks for Pattern Recognition by C. Bishop. Oxford
University Press, 1995. [Amazon]
This is a classic text for density estimation and pattern recognition.
- Pattern Classification by R. Duda, P. Hart, and D. Stork. Wiley-Interscience,
2000. [Amazon]
Another classic, recently brought up-to-date.
- Information Theory, Inference, and Learning Algorithms by D.
MacKay. [online]
Excellent treatment of information theoretic methods in learning.
- Linear Algebra and Its Applications or Introduction to Linear
Algebra by G. Strang. [Amazon]
- Matrix Reference Manual [online]
- Introduction to Probability by D. P. Bertsekas and J. N. Tsitsiklis.
[online]
- Probability, Random Variables, and Stochastic Processes by A.
Papoulis.
Classic text for probability theory and its application. Also see on-line lecture
notes for EE178 at Stanford.
- Probability Theory: The Logic of Science by E. T. Jaynes. [online]
Classic text on probability theory, chapters 1 and 2 in particular are good
background reading.
- Applied Optimal Estimation by A. Gelb (editor). MIT Press, 1974. [Amazon]
Classic text for Kalman filter and linear estimation.
Help with Matlab
Related courses:
Problem Sets
PS 1 [ps] [pdf]: Out Jan
07; Due Jan 14: Solutions [ps] [pdf]
[tex] (Review of background
material) (2% of grade)
PS 2: [ps] [pdf]
[empca, facedata]
Out Feb 5; Due Feb 25: (Face detection and recognition using PPCA)
(23% of grade)
Midterm: [ps] [pdf]
Out Mar 13; Due Mar 14: Solutions [ps]
[pdf] [tex]
(15% of grade)
PS 3: [ps] [pdf]
[digits, hmm,
hmm2, epmt]
Out Mar 15; Due Mar 29: (Isolated digit recognition using
HMM) (15% of grade)
Final Project: [swiki]
In-class presentations 8:00-10:50 am on Wed. May 1. Reports due
by midnight on Fri. May 3.
Syllabus
- Introduction and overview of course contents. Review of basic material. [1/4/02]
- Skin Color Modeling [1/7,9,11/02]
- Resources: CRL
Tech Report
- Tools: Overview of graphical models, likelihood ratio test, ROC curve
- Histogram models from web images
- Separability of skin and nonskin distributions
- Adult image detection.
- Introduction to face analysis and applications [1/14/02]
- Modeling facial appearance
- Face recognition, tracking, and synthesis
- Speech recognition
- Resources: Tony Robinson's Speech
Analysis page.
- Tools: Hidden Markov Model, Expectation-Maximization
- HMM introduction [2/13,15/02]
- Viterbi algorithm [2/18/02]
- EM algorithm [2/20-22/02]
- Speech recognition architecture [2/25/02]
- Language modeling [2/27/02]
- Acoustic modeling [3/01/02]
- Viterbi, beam, and tree search [3/11-13/02]
- Facial Expressions and Gesture Recognition
- Resources: EM (R.
Neal, T.
Minka); Thad ASL; Revert
Speech
- Tools: HMM, EM
- EM and mixture density learning [3/15/02]
- Analysis and synthesis of facial expressions and speech reading.
[3/18/02]
- HMM-based recognition of ASL [3/20/02]
- EM as lower-bound maximization [3/22/02]
- Modeling Human Motion
- Resources: M. Brand (HMM,
Puppetry,
Style)
- Tools: ARMA models
- Introduction to stochastic processes and linear Gaussian models
[3/25/02]
- ARMA models [3/2702]
- Yule-Walker equations [3/29/02]
- Class presentations of final project proposals [4/1,3/02]
- Head and hand tracking
- Tools: State space models, Kalman filter
- Introduction to state space modeling [4/5/02]
- 3-D model-based tracking
- Tracking with morphable models
- Contour-based hand tracking
- Figure tracking
- Tools: Gaussian sum filter, particle filter
- Kinematic modeling
- Appearance-based tracking
- Visual singularities
- Monocular reconstruction
- Action recognition
- Tools: Stochastic CFG, Bayesian networks
- Connections between actions and language
- Modeling paradigms
- Action recognition
- Multi-modal sensing
- Tools: Bayesian networks, "Flexible" models, maximum entropy
- Speaker detection
- Speaker tracking
- Speech reading
Final Projects
Project Ideas
- Skin color
- Using a calibrated skin color dataset which includes a reference color
chart for each example image to conduct a study of color-based skin
detection. In particular, can compensating for the illuminant lead to
better detection rates?
- Reproducing Michael Tarr's recent work on gender classification using
red/green skin color ratios.
- Face analysis
- Learning a distance measure for "Separated at Birth" (TM).
Given a set of facial image pairings which combine facial similarity
with some text attributes, is it possible to learn a distance measure
which would produce similar pairings on a test dataset?
- Face detection or recognition using boosted decision trees.
- Human motion
- Gesture recognition (for example American Sign Language) using HMM's.
- Synthesizing human motion by learning dynamic models from motion
capture data.