Tracking Head Yaw by Interpolation of Template Responses

Mario Romero and Aaron Bobick

Computational Perception lab, GVU Center, College of Computing, Georgia Institute of Technology

 

Introduction

The goal of this project is to track the horizontal rotation of the head which is called yaw. More formally, yaw is the rotation about the vertical axis of the head. We measure it as the angle between the optical axis of the camera and the normal to the tip of the nose. We report results for the range between full profiles. We chose yaw because in the process of computer-human interaction the most common interactive environments unfold horizontally.

 
paper
 
video
 

Architecture

 

Our method is appearance based. We build template detectors that have varying response to the input image, we filter the response to get rid of noise and we interpolate the response into a single scalar output that is the angle of yaw. The templates are five mean face templates sampled at every 45 degrees in the 180 range between full profiles of the head. The response is computed as the normalized correlation (NC) between the input image and the five templates. The response vector is five dimensional and can be understood as a point in 5-D space. We use a Kalman filter to track the position and velocity of this point. Then, we normalize the response to have unit length. This way, we place the response to different input images onto the same scale. We use a Radial Basis Function Network to interpolate the 5-D response vector and estimate the angle of yaw. The basis of the RBFN are Gaussians. They amplify the noise in the response vector unevenly and a second filter is used at the output of the RBFN to deal with the unevenly amplified noise.  The second filter is a first order scalar running average.

 
 

Detector Responses

The response of the five template detectors must have two characteristics for the RBFN to be able to interpolate it. First, response must react smoothly to continuous changes in the input image. Second, the response to images from similar views must have low variation. Here we present the mean response of the five template detectors to 200 images from five discrete views and the error bars represent one standard deviation.
 
 
 

Response Filtering

 
 

Results and Error

 
 

Error as a function of angle

 
 

Relaxing our assumptions