Project 5 / Face Detection with a Sliding Window

Highest average precision with cell size=3

The goal of this project is to implement face detection using sliding window detector of Dalal and Triggs 2005. We represent features as SIFT-like Histogram of Gradients, then train a linear classifier and use our classifier to classify sliding windows at multiple scales.I achieved the highest precision of 0.921 with cell size 3. The project is divided into the following four parts:

  1. Get positive features
  2. Get negative features
  3. Train linear classifier
  4. Run detector

Get positive features

For each cropped image,we first convert it to grey scale, then using vl_hog to extract Hog features. Smaller hog cell size tends to produce better results.
 

Get negative results

In this project, I got around 10000 negative features. Since we only have 276 non face images and each image is much larger than 36*36, we can extract multiple features in each image. For each image, I got about (10000/num_images) features, and then I picked random points in that image to get negative features. I also sampled those images at multiple scales ranging from 1.0 to 0.8.
 

Train a linear classifier

I used vl_trainsvm to train a linear classifier from positive and negative examples. Positive examples have y value of 1 and negative examples have y value of -1. The following two images are learned detector with different hog cell size. The left shows the hog template with cell size 3, and the right image shows the hog template with cell size 6.
 

Run detector

I converted each test image to hog feature space for multiple scales and each scale is 90% of the original one, then I classified each hog cell which has the same size as my learned template. If the confidence value is higher than threshold, then keep this detection. Finally, passed all the detections for an image to non-maximum suppression.Higher threshold value tends to produce less false positive, but may also reduce the average precision. Below are the results for different threshold values with cell size 3.
threshold=0.5
threshold=0.8
threshold=1.1
 
 
Multiple scales tend to produce better results. If the test image is only run on single scale, then the average precision is only 0.399 . However, run the test image on multiple scales can increase precision to 0.921. Below are the results for multiple scales and single scale with cell size 3.
single scale
multiple scales
 
 
Smaller hog cell size tends to produce better results. Below are the results for different cell size.
cell size=3
cell size=6
 
 

Results

I produced highest precision with cell size 3, lambda=0.0001, threshold=0.5, scale=0.9.

Precision Recall curve for my final classifier.

 
 

Examples of detection on the test set.