Project 5 / Face Detection with a Sliding Window

In this project I have experimented with sliding window model from tha paper Dalal and Triggs 2005 for face detection. This report is divided in two parts:

  1. Face detection pipieline:
  2. Extra Credits

Let's go through each of them:


1. Generate HoG features

  1. I have used vl_hog API from vlfeat library directly to compute the HoG features.

  2. First, for generating features for positive examples(faces), I have used a positive training database of 6,713 cropped 36x36 faces from the Caltech Web Faces project. I have also generated permutation of these HoG features to be able to detect axis symmetric faces and also to double my training features for positive examples.

  3. For negative examples I have used a small database of such scenes from Wu et al. and the SUN scene database. I randomly sampled different sub-images of the template size same as the training images(36 x 36) per image. I limited the number of such randomly sampled sub-images to 80 per image.

  4. I experimented with both UoCTTI and Dalal-Triggs variants provided by the vl_hog API.


2. Train SVM classifier

Look for svm_train.m file in the code directory
  1. Then I trained a linear SVM model with the above extracted features. Linear models are faster to train and test with, making it appropriate for face detection in real world.

  2. I have used vl_svmtrain from vlfeat library. Here are the training results:

  3. cell size = 3, template size = 36, lambda = 0.001
    MetricValue
    Accuracy1.0
    True positive rate0.390
    False positive rate0.000
    True negative rate0.609
    False negative rate0.000


  4. A look at the seperation between the positive and negative features.


  5. Visualization of the learned detector's HoG template.


3. Mutli scale detector

Look for run_detector.m file in the code directory
  1. In this final step, first I 'm generating multiple scaled images of the given original test image.

  2. Then for each scaled image, I 'm calculating a HoG descriptor. I 'm cropping all sub HoG descriptors of template size stepping with cell size in this HoG descriptor. Then for each sub HoG descriptor I get a confidence value from SVM parameters and then classify it as positive or negative on the basis of a tuned threshold.

  3. I have used dynamic scaling which generates as many scaled images as the maximum of an upper limit that I have placed or downscaled size limited by template size.

  4. Here are different results based on varying parameters:

  5. template size = 36
    Cell sizeLambdaRecursive scale factorRecursion max depthThresholdAvg precision(%)
    30.0010.95400.5594.0
    30.0010.95300.5593.8
    30.0010.92300.5593.5
    60.0000010.95400.6087.4
    60.0000010.90400.6087.0


  6. Here are the best average precision vs recall, recall vs false positives obtained:



  7. Visualization of true postive, ground truth, false positive on some of the test images:








Extra Credits


1. Experiment with HoG variants

Till now I have been using UoCTTI variant of HoG descriptor in vl_hog by default. I experimented with how well the other variant DalalTriggs would do. As can be seen from the comparison it performed a little worse than the default one.


template size = 36, lamdba = 0.000001, recursive scale factor = 0.95, ecursion max depth = 40
VariantThresholdAvg precision(%)
DalalTriggs0.8578.8
UoCTTI0.6087.4




2. Mining hard negatives
Look for get_hard_negative_features.m file in the code directory

I experimented with mining hard negatives. I ran my detector on the negative training examples again and found out the hard negative examples. Now I used the previous ones and these examples as the augmented negative examples for my SVM and obtained new model parameters. As expected the results didn't really improve much, but I observed a decrease in the amount of false positives.


Budget = 10000, template size = 36, recursive scale factor = 0.95, ecursion max depth = 40
Hard-MiningLambdaThresholdAvg precision(%)
Yes0.000010.0089.2
No0.0010.5594.0







3. Different model than linear - Cascaded classifiers
Look for run_detector_cascaded.m file in the code directory

Here I 'm trying to model a cascade of two linear SVMs. This effectively makes it a non-linear SVM. First I train on positive and random negative features. Then I train a seperate model on the previous positive features but only the hard negative features found through hard mining. Then the detector has a tuned threshold for each SVM. Based on the results, I observed that the false positives have been eliminated substantially, keeping average precision almost same as the normal pipeline.


template size = 36, recursive scale factor = 0.95, ecursion max depth = 40
ClassifierLambdaThresholdAvg precision(%)
Cascaded SVMs(Non-linear)0.001, 0.00010.0, -0.7093.5
SVM(Linear)0.0010.5594.0





4. Hard mined detector on the bonus scenes

Here I have run my hard mine detector on the easy and hard iamges of the same scene.