Project 5 / Face Detection with a Sliding Window

Our aim in this project is to detect faces in an image. Histogram of gradients features are used to characterise the images and these features are used to train a linear Support Vector Machine, which is then is used to classify image patches as either a face or not. The project can be divided into the following stages:

  1. Getting HOG features from Training data
  2. Training Linear SVM classifier
  3. Using a sliding window to detect faces in test images.

HOG Features

HoG features are extracted from the positive and negative training image sets, where positive set contains cropped faces and negative set contains scenes with no faces. The template size is fixed at 36. The step size of the HoG feature determines the size of the blockm of pixels over which the histogram is computed. This value is varied (6,4,3) and 3 gives maximum precision but takes more time. The number of orientations used is 9, resulting in a histogram of length 31. The histograms obtained from the template of size 36*36 (in this case 12*12*31 block in the HoGspace) are reshaped into a single feature. Thus, each face image contributes one feature. A random number of features are sampled from the non-face images such that around 10,000 negative features are obtained in total.

Linear SVM classifier

The SVM is trained with the positive face data labelled with +1 and the non-face features labelled with -1. The labmda parameter value is kept at 0.00005.

Sliding Window to detect faces

The image is converted into HoG features and a window of size (template_size * template_size * histogram_length) is moved over the HoG array and for each block, the SVM is used to calculate a confidence score. The threshold is set at 0 and blocks having a confidence score above the threshold are selected as faces. This process is performed at multiple scales, with each scale being (0.75)j of the original dimensions of the image. Values of j in the range 0 to 10 are taken. The coordinates of the extreme corners of a window that is selected as having a face are used to construct a bounding box around the face. Since a face could be detected at multiple scales, non-maximum suppression is performed using the bounding box coordinates such that each face is only detected once.

Results

Size 6
Size 4
Size 3

Graduate Credit

Hard Mining for Negatives

Non-face entities are sometimes detected as faces and to reduce this, the SVM classfier is re-trained giving increased weightage to features that are detected as faces but are actually not. For this, the sliding window detector is ran on images where we know there are no faces present. Feature blocks that are classified as faces are collected, added to the negative training features and used to train the SVM again. This gave little extra precision, increase from 0.80 to 0.83 was observed.