Project 5 - Face Detection with a Sliding Window

The goal of this project is to implement a face detection algorithm using the sliding window approach as described in Dalal and Triggs (2005). This is an outline of the pipeline:

  1. Obtain positive features from facial images of size 36 by 36
  2. Obtain negative features from random images that were not of human faces.
  3. Train an SVM classifier with provided training data
  4. Run the sliding window detector

Extract positive features

The size of positive facial images were 36 by 36 and so were the size of the features. I called vl_hog from the vl_feat library to get the HOG descriptors. I also adjusted the cell size of the histograms. This was seen to give different accuracies. Smaller cell sizes gave higher accuracies but longer running time.

Extract negative features

I randomly assigned a start point to each of the features because the image sizes are not fixed. The, I called vl_hog on each image.

The following images show the visualisation of the HOG descriptors at different cell sizes

Cell size = 3

Cell size = 4

Cell size = 6

Train SVM classifier

This is called using the vl_feat svm function

Cell size = 3

Cell size = 4

Cell size = 6

Run sliding window detector

This is the last part of the project where the learned model is applied on the test set. If the confidence of a specific window of the image is larger than an amount, then it is kept as a face detected. Non maxima suppression was applied in order to get rid of repeats. The results are as shown below.

Results in a table

Precision Recall
step size = 6; threshold = 0.6
step size = 6; threshold = 0.75
step size = 6; threshold = 0.8
step size = 4; threshold = 0.75
step size = 3; threshold = 0.75

As seen in the graphs above, precision seems to increase with decreasing step size. The number of false positives also seem to decrease with increasing step size.

As seen in the graphs above, decreasing the threshold seems to decrease the number of false positives, as expected because more image windows would be considered faces even if their confidence is lower, hence letting through images that are not faces.

Lastly, here is the image of the face detections of the best performing classifier with step size 3 and threshold 0.75, after experimenting with many other parameters. The first image is the class photo, while the second is of arsenal. They have very high precision.