Final Project Progress Report for
Computer Vision Part II
By Alan Daniels

Introduction

As stated previously, the goal of my project is to track the motions of the hand and head of a computer user, in real time, and interpret those motions in order to control the computer's GUI. For the GUI, I've chosen the X Windows system running on Linux, and I will be writing the software using the GTK toolkit. This web page is a report on where I am so far.

State Of The Hardware

After originally considering the color QuickCam, I've decided to go with the U.S. Robotics BigPicture video camera for performing the video capture. I've decided on this for two main reasons. First, the BigPicture comes with its own PCI video capture board, which I need in order to get a decent frame rate. The color QuickCam uses a standard parallel port, which gives it a frame rate too slow to be of any use in tracking motion. Second, enough driver information is available for the BigPicture that Linux drivers are available, since the BigPicture's capture board uses a standard BTTV chip set. Although the drivers are still Beta, there are enough reports of successful use of the board to make me confident of getting proper results.

For preliminary tests of the camera using its pre-supplied software, the apparent frame rate is about fifteen to twenty frames per second, which should be sufficient for tracking normal hand motion from a user. Also, the color quality for the camera is good, so no excessive amounts of pre-processing for each frame should be needed in order to get decent motion tracking data. So, it still looks feasible to obtain my original goal: Tracking the user in real time.

State Of The Software

So far, I've laid the groundwork for the application itself. The screen is split into two sides, with the left side being feedback of what the video camera sees (along with a color overlay of where it sees the head and hands of the user), and the right side being a scrolling list of the "primitives" that the computer has recognized (such as "Head Tilt Left", "Left Hand Move Upward", etc). Also, a large status area at the bottom, the "major event" that just occurred, such as "minimize a window", "close a window", etc. Most of the groundwork I have so far is in the visual look of the application. Although I do not yet have working feedback from the video camera, to do analysis on, I believe that I've solidified the concepts of what information the system will need to work correctly.

After some reviewing, here are the events I've decided to support, along with the action that the user takes. I will add to this if time permits towards the end of the project:

  • Minimize
    A "lowering" gesture with either hand.
  • Maximize
    A "raising" gesture with either hand.
  • Close
    A "dismiss" gesture done with either hand.
  • Terminate a program
    The user makes a "throat slicing" gesture.
  • Confirm a dialog
    The user nods their head.
  • Cancel a dialog
    The user shakes their head.

To accommodate these events, here are the primitives that the system needs to be able to recognize:

  • Head location.
  • Head movement will be interpolated from change in location.
  • Head tilt (up, down, left and right).
  • Hand location (which hand is left or right will be assumed from this).
  • Hand movement will be interpolated from change in location.
  • Orientation of either hand (palm out, palm flat, etc).

Where To Go Next

Obviously, the next step to accomplish is getting the application to get accurate data from the video capture board. I am confident that I should be able to do this within the week. Once this is done, the system needs to be able to track the position of the head and hands. An area that I still need to research is, what is the best way to track the orientation of the user's hands. This research becomes much more feasible once I have data from the video capture board that I can experiment with.

About This Web Page

This web page was written on May 17th, 1998 by Alan Daniels. If you have any questions or comments, please let me know.