CS 4495/7495
Computer Vision

Fall 2003
College of Computing 102
MWF 11:00 - 12:00 noon

Organization        Syllabus        Problem Sets        Final Projects

Fall 2002 Class Web - This site contains slides and problem sets from the last time I taught this course. Please refer to this for examples of assignments.


This is the introductory computer vision course in the curriculum for Computational Perception and Robotics in the College of Computing. This course is designed to provide

Instructor

Jim Rehg
Email: rehg@cc.gatech.edu
Office: CoC Bldg (CCB) 253
Office hours: After class in my office
Phone: 404-894-9105 (email preferred)

Teaching Assistant

Jianxin Wu
Email: wujx@cc.gatech.edu
Office hours: TBD

Text

Computer Vision: A Modern Approach
by David Forsyth and Jean Ponce
Prentice Hall 2002

Prerequisites

Familiarity with linear algebra and statistics will be essential. Equally important is familiarity with C++ programming. A background in signal and image processing will be helpful. I will be providing some additional tutorial sessions throughout the semester. However, these tutorials will not be a substitute for adequate background in the prerequisite areas of study. Permission of the instructor in special cases only.


Organization

Grades will be assessed as follows:

Problem Sets 55%
Midterm Exam 15%
Final Project 25%
Participation 5%

There will be a take-home mid-term exam that will not require any programming.

Problem Sets

There will be approximately 6 problem sets of one or two weeks duration. 

Collaboration on problem sets is encouraged at the "white board interaction" level. That is, share ideas and technical conversation, but write your own code. A few problem sets may require you to work in teams of 2-3. I plan to grade and return problem sets promptly. As a result, I will require all problem sets to be turned in on time.

No late submissions will be accepted without prior permission of the instructor.

For each problem set, their will be two "tracks" (separate versions of the problem set) which differ in what is required:

The purpose of these two tracks is to accommodate the variety of students who take this class, which include graduate and undergraduate students who are interested in computer vision research, graduate students who need vision technology for their research in other areas such as HCI or systems, and undergraduates who are looking for an interesting technical elective. I expect the more research-oriented folks to do the Track 1 problem sets. My intent is to allow students to choose which track to work on and to do this on a per-problem set basis. The plan is to publish both versions simultaneously.

A number of problem sets will require implementation of a vision algorithm or small vision system using the Microsoft Visual C++ programming environment. In particular, you will learn to use a toolkit known as Viper which is a standard environment for vision programming here in the College.

New VIPER2 download: Go to CPL Code Base and download. You will need user name and password that I have sent to you in via email.

Instructions on downloading the Viper toolkit and linking to the appropriate libraries are available here. Students who want to run Viper without mounting CoC file systems can download (33 Mb) all of the required libraries. There are 20 PC's on the first floor of CCB which are available for class use. By the end of this month they will be located in our new Media Lab, which is not yet on-line. In the meantime, there should be around 10 machines in the States cluster, and 10 are in the UPL cluster right next door. Each machine will have a sign saying that students in 4495/7495 have priority use. This means that students who are not in the class should surrender the machine to you if you need it.

Instructions for turning in the problem sets:  * NEW *

  1. Make zip file containing the required material. Name the file LastName_FirstName_PSN.zip (where you fill in your name and the problem set number "N" appropriately).
  2. Copy this file to /net/hi21/cs7495/PSN_2003/ (where N is the PS number)
  3. Execute 'chmod g+r <your file>' where <your file> is the path to your zip file. This will allow Jianxin to read your file. The directory permissions have been set to prevent listing so you won't be able to enter the directory or display its contents. This will prevent others from accessing your file. Note that you may not be able to do this under Solaris and will have to telnet to a Linux box or an SGI machine. If you do not have CoC access then email the file to Jianxin Wu.

Grading

Undergrads and grads will be graded on separate curves. I will expect more from a graduate project than an undergraduate project.

There are a number of ways to earn participation credit: Participate in class by asking questions or making comments, find bugs in the course text or my slides, etc.


Resources

The following supplemental texts may be helpful:


Syllabus

Viper tutorials: 9/21/03 (6 pm), 9/22/03 (12 noon) [slides]
Viper2 tutorials: 10/02/03 (6 pm), 10/03/03 (12 noon)

  1. Introduction
  2. Stereo Matching
  3. Photometry
  4. Color
  5. Image Representations: Filtering, Pyramids, Edge Detection, and Texture
  6. Camera Models
  7. Motion
  8. Statistical Methods
  9. Object Recognition
  10. Conclusions

Problem Sets

PS 1: Out Aug 22; Due Sept. 3 (midnight): (Intro to Viper)  (2% of grade)

PS 2: Track 1, Track 2. Out Sept. 8; Due Sept. 19: (Stereo Matching Part 1)  (10% of grade)

PS 3: Track 1, Track 2. Due Oct. 3 (Stereo Matching Part 2) (15% of grade)

PS 4: Due Oct 16 (Stereo Matching Part 3) (13% of grade)

Midterm: Track 1, Track 2. Due Oct 20 (Photometry and Color) (15% of grade)

PS 5: Track 1, Track 2. Due Nov 14 (Texture Synthesis) (15% of grade)


Final Projects

Swiki for final projects

Important Dates

Project Proposal

Please write 3 paragraphs describing your project. Your proposal should answer the following questions:

Project Presentations

Project presentations will take place on Friday, December 5, from 9:30 AM - 12:00 PM in the Instructional Center Room 115 (IC 115).

Each project presentation is limited to 8 MINUTES ONLY. This includes any time needed to change speakers, download presentation, etc. Time limits will be strictly enforced. Your presentation will end after 8 minutes no matter where you are in your talk. We have 18 presentations to hear and we simply cannot afford to get off-schedule.

In preparing your talk, allow 5 minutes for the presentation itself, leaving 2 minutes for questions and 1 minute for set-up. It is recommended that each team select a single person to give the entire presentation. The presentation must have three parts:

All project presentations must be linked from the swiki. A sample presentation is included under the example entry. Follow this model in creating and linking your talk. I am going to group your presentations by topic on the swiki and assign each one to a specific time slot. We will be using the swiki during the presentation period to give the talks.

You may use either powerpoint or html for your talk slides. If you plan to use html, 1) Please create a separate page for your project presentation and link to it, 2) Use appropriate font choices and graphics for a presentation (i.e. do not plan to scroll through paragraphs of unreadable text for your presentation. There is a big difference between a presentation and a standard web page.)  Since the swiki server does go down sometimes, I recommend hosting your presentation on the CoC web server and linking it from the swiki. In any event, you should bring a back-up copy of your presentation with you in case there is some problem with the network. The PC in the room will be used to host the presentations.

If you plan to use videos in your talk, you need to let Jianxin know in advance what format you will be using. You should arrive at IC 115 15-20 minutes before 9:00 to make sure your video plays properly on the PC in the room. (It is not safe to assume that a given PC will have the codecs that are required to play your video.)

I expect everyone to attend all of the presentations, with the exception of folks that have class conflicts (and you need to let me know about those by email). Attendance will count towards the "participation" part of the course credit. There will be no final exam in this class, so you can view your attendance during the presentation period as a substitute for the really hard final exam that I didn't give you :-) We have a lot of great projects this year and I think you will enjoy the presentations.

Project Reports

Project reports must be linked from the swiki and must be available for reading by midnight, Friday December 12. If your report is not available when I start reading them on Saturday, then you will lose that portion of your grade. To avoid any mishaps with the swiki, I suggest that you double-check that your report is available once you've linked it by clearing your browser cache and reloading the swiki page. A sample report along with the Latex source that produced it will be included under the example entry on the swiki. You may submit your report in either pdf or Word format.

Your project report should address the following:

I expect the reports to be between 5 and 10 pages. Shorter reports are better, just be sure to address all six of the points listed above. I am particularly interested in your analysis of your results. Please include example images, illustrations, etc. so that it is clear what you are describing.

If you created videos for your project and you link them on the swiki, then I will view them as part of reading your report.
I am very interested in seeing your videos!

Some Final Project Choices

Here are some sample projects which I believe to be doable in the time available (some are perhaps more likely to be grad projects than undergrad ones). Of course it is also fine if you propose one of your own, perhaps based on your current research activity.

1. Color Constancy

Implement algorithm 6.1 from your text for determining the lightness of image patches. Exercises 6.8 and 6.9 are relevant here as well. Explore both the gray world and brightest patch approaches to establishing an absolute reference. Test your algorithm on Mondarin images as well as more complex scenes.

2. Motion Matching

Implement camera calibration from point correspondences obtained from a calibration grid. Track the motion of the camera with respect to a static scene over a short image sequence (which we will provide). Using the known location of the calibration grid and camera pose, render images of a synthetic object from the correct viewpoint (rendering code will be provided). Graduate projects would be expected to address some photometric aspects of blending.

3. Real-Time Tracking

The Viper environment includes some infrastructure for capturing live video from a Sony Elura camera which can be checked out of the CPL lab (talk to me about it). This project would explore some tracking algorithms using this live video as input. This is a good way to quickly get intuition for what works (and doesn't work). Start with a standard patch-based SSD tracker, as we will describe in class. Modify this to use particle filtering to improve the robustness of the tracker to noise and background clutter. I can provide references to this material to anyone who is interested.

4. Improved Stereo

This project would extend your stereo system developed in PS #4 to obtain better performance. Issues which could be explored include use of edge cues to obtain better results at occlusion boundaries, use of pyramid representations to speed up matching, consistency tests for occlusion, etc. Contact me for more details.

5. Object Recognition

Develop a program for detecting coins in images using Hough transform. Follow the instructions. Extend your program to recognize dollar bills as well as coins (using the same edge-based technique). Extend your program further to reason about possible occlusions of coins by dollar bills.

6. Improved Texture Synthesis

This project would extend your texture synthesis algorithm developed in PS #5 to better handle the synthesis of long image structures. Issues which could be explored include the DeBonet synthesis framework or nonparametric synthesis methods like Efros and Leung. Contact me for details.