Project 5 / Face Detection with a Sliding Window

Detecting faces is an important application of computer vision by which the computer identifies human faces in the images. Face Detection is used in a lot of applications such as facial recognition in video surveillance, image database management, human computer interface, photography and marketing. We use the sliding window model for detecting faces which independently classifies all image patches as being an object or non-object. The pipeline can be listed as follows:

  1. Obtain HOG Features from training set
  2. Train Linear SVM classifier
  3. Use a sliding window to detect faces in test images

Stage 1: Obtain HOG features from training set

HoG features are extracted from the positive (face images) and negative(scenes containing no faces) training image sets. The template size to be used is fixed at 36. The step size of the HoG feature determines the size of the block of pixels over which the histogram is computed. This value is set at 3 to achieve maximum precision, though compute time is higher. This value was varied as 6, 4 and 3 and as well decreased the value, we found out the average precision increases but at the cost of more computing time. The number of orientations used is 9, resulting in a histogram of length (3*number of orientations + 4) = 31. The histograms obtained from the template of size 36*36 (in this case 12*12*31 block in the HoG space) are reshaped into a single feature. Thus, each face image contributes one feature. A random number of features are sampled from the non-face images such that around 10,000 negative features are obtained in total. The more the number of features used, more rich is the training data, which improves accuracy of detection.

Stage 2: Train Linear SVM classifier

The SVM is trained with the positive face data labelled with +1 and the non-face features labelled with -1. The labmda parameter value is kept at 0.0001. Not much variation in accuracy was observed (~2%) when changing lambda in the range 10-3 to 10-5

Stage 3: Use a sliding window to detect faces in test images

The image being tested is first converted into HoG features. Now, a window of size (template_size * template_size * histogram_length) is moved over the HoG array and for each block, the SVM is used to calculate a confidence score. The threshold is set at 0 and blocks having a confidence score above the threshold are selected as faces. This process is performed at multiple scales, with each scale being (0.75)j of the original dimensions of the image. 20 values of j in the range [0, 10] are taken. The coordinates of the extreme corners of a window that is selected as having a face are used to construct a bounding box around the face. Since a face could be detected at multiple scales, non-maximum suppression is performed using the bounding box coordinates such that each face is only detected once.

Face template HoG visualization for cell size 6

Precision Recall curve for cell size 6

Face template HoG visualization for cell size 4

Precision Recall curve for cell size 4

Face template HoG visualization for cell size 3

Precision Recall curve for cell size 3

Examples of detections

Overall, it was observed that the face detector detects peoples' knees as false positives very often. Another shortcoming is that the detector cannot detect faces of people with a dark complexion. Theoretically, this should not be a problem because HoG features are illumination invariant as they rely on gradient values. However, practically, this is not the result obtained.