Project 5 / Face Detection with a Sliding Window

Running the detector on our class photo

In this project, I use the sliding window model for face detection. More specificaly, I implemented the sliding window detector by Dalal and Triggs 2005. My implementation cnsists of the following parts:

Part 1: Get the features

I used the Histogram of Gradient features to detect faces. The features are the amount of gradients that fit a specific angle range bin. I used vl_hog package because the HoG extracter I wrote in a previous project is inefficient and slow. There are positive and negative features to be extracted. I
  1. Positive features: Using the positive examples as training set, I just extracted the HoG feature from each image and resized the feature to 1 dimensional array,
  2. Negative features: Using the negative examples as training set, for each image, I would decide the number of patches to extract the features first. Then, select random patches (whose size was defined by template and cell size) at different scales and extract features. I used 10000 negative features.

Part 2: Classification

I wrote a simple linear SVM and trained it on the training data. Then, I tested it on the training data to check itf its working correctly. It got an almost perfect accuracy as expected.

csize = 6
Initial classifier performance on train data:
  accuracy:   1.000
  true  positive rate: 0.398
  false positive rate: 0.000
  true  negative rate: 0.602
  false negative rate: 0.000
vl_hog: descriptor: [6 x 6 x 31]
vl_hog: glyph image: [126 x 126]
vl_hog: number of orientations: 9
vl_hog: variant: UOCTTI
I used Lambda = 0.0001 for this. At this point, the template started looking like this:
If you squint, you can make out a human face shape.

Part 3: Sliding Window

For each image, I would start at the smallest scale the image would allow and iterate to the whole image (scale = 1) in increments of 0.05. During each run, I slid a window of the size template size and extracted HoG features. Then, using the SVM, I calculated the score of the feature and discarded any that did not meet a predefined threshold. The accepted feature would go through non maximal suppresion. For the MIT Caltech set, the results were checked against the ground truth correspondence.

Experiments and results

Initially, for the starter code, the results were
Template with a uniform gradient distribution

Precision Recall curve for the starter code.

Argentina team

After implementing the whole pipeline, I started playing with the threshold in step 3. Some of the results are: The threshold values are:
0.0 0.5 0.70
0.75 0.90 1.2
The corresponding precision recall curves are:
After some experimenting, I setteled on 0.75 because I felt it was a reasonable trade off of precision at higher recalls. The results at this value are:
Template

Precision Recall curve for 0.75 threshold. The precision is 0.827

Argentina team

As you can see, this is clearly better than random results Some other examples:

Da Vinci Man

My detector gets confused by text and gives out false positives.

Class photo easy:

Class photo difficult:

As you can see, there are plenty of false positives in both.

Other experiments

I also played around with mirror images for positive features. The reasoning is that since the human face is symmetric, training the classifier with horizontally flipped images should give better results. It is implemented in the get_positive_features_mirror.m of my code. I got a small boost to 0.84 precision.