Project 5 / Face Detection with a Sliding Window

Algorithm Description

Getting Positive Features

To get the positive features from the training face training images, the program simply takes each image and converts it to a HOG feature, reshaping the HOG feature into a 1 x n vector, where n is (template_size / hog_cell_size)^2 * 31. Finer grain HOG cell sizes produce much better detection results at the cost of computing time.

Getting Negative Features

To build the negative feature set, each image is cropped to (template_size / hog_cell_size) x (template_size / hog_cell_size) size a at a set of random coordinates. Each cropped image is converted to a HOG feature and then reshaped into a 1 x n vector, where n is (template_size / hog_cell_size)^2 * 31. This is repeated (num_samples / num_images) times for each training image, until the final number of samples is reached.

Training the Classifier

To create the training sets, the positive features and negative features were concatenated, making one large set of training data. To create the labels, a vector of num_pos_features 1's and a vector of num_neg_features -1's were concatenated to each other. To generate the W vector and B value, an SVM trainer was run with a lambda of 0.0001. After testing out different lambda values, it appeared that the recommended lambda value provided the best recall/precision tradeoff when training the data.

Detecting Faces

To detect faces, a sliding window approach was used. First the image is converted to a HOG feature image. Then each (template_size / hog_cell_size) x (template_size / hog_cell_size) HOG feature in the image is checked against the SVM training results using the formula W*F + B, where F is the HOG feature. If the result is above the threshold of -0.25, then the feature is added to the list of faces for the current image. After testing different threshold values, it became apparent that a slightly negative value produced the best results, especially considering that the SVM was trained using data biased towards a negative result. The window then shifts one hog cell feature and repeats the test. To better detect faces of different sizes, the sliding window detector was run on multiple scales of the image. After tinkering with different scales, it appeared that using a scale of 1, 0.8^n (n = 1...5), and 0.6^n (n = 1...5) produced the best precision and recall results for the speed of detection. After all potential faces have been detected, non-maximum suppression is performed to help suppress duplicate detections.

Results with Different Cell Sizes

HOG Cell Size 6

Face template HoG visualization.

Precision Recall curve.

Example of detection on the test set.

Example of detection on the class image.


HOG Cell Size 4

Face template HoG visualization.

Precision Recall curve.

Example of detection on the test set.

Example of detection on the class image.


HOG Cell Size 3

Face template HoG visualization.

Precision Recall curve.

Example of detection on the test set.