Project 5 / Face Detection with a Sliding Window

Algorithm

This project uses a sliding window object detector to identify faces in images. In particular, the Dalal-Triggs sliding window detector was used.

The Dalal-Triggs sliding window detector uses a histogram of gradients (HoG) feature representation. This is a fairly robust feature descriptor for images that is less variant to brightness, scale, and lighting.

In order to identify faces, a linear support vector machine (SVM) was trained on positive and negative HoG examples. Each positive example is a 36x36 image of a face, while negative examples varied much more significantly.

Positive examples were chosen to be 36x36 due to the cell size of the HoG features. The HoG representation divides the image into 6x6 cells of HoG features. While a smaller cell size could be used, computation time increases as the cell size decreases. In contrast to the preselected positive examples, negative examples were randomly selected from images with no faces.

While the positive examples were sized at 36x36, faces in an image are not bound to such restrictions. To detect faces of varying sizes, the sliding window detector resizes the image to many different scales. This is called a multi-scale detector. However, this approach runs the risk of detecting duplicate faces at different scales. To avoid this, the less confident detections in the same region are ignored and only the most confident detection is kept. This is called non-maximum suppression.

Results

Using a 17-scale detector with a 0.80 scale factor and a cell size of 6x6, the sliding window face detector was able to achieve precision of roughly 82%-83%. The SVM was trained with a lambda of 0.001.

Face template HoG visualization. The outline of a head, mouth, and eyes are visible in the light-colored gradient directions.

Precision Recall curve for the face detection. Notice that when recall approaches 0.6, the precision starts to drop precipitously.

Examples of detections on selected images from the test set.