Project 5 / Face Detection with a Sliding Window
In this project I have experimented with sliding window model from tha paper Dalal and Triggs 2005 for face detection. This report is divided in two parts:
- Face detection pipieline:
- Generate HoG features from a given set of positive and negative image exmaples for face
- Train SVM as a classifier on these generated HoG features
- Run this detector at multiple scales on a given test set of images
- Extra Credits
Let's go through each of them:
1. Generate HoG features
- I have used vl_hog API from vlfeat library directly to compute the HoG features.
- First, for generating features for positive examples(faces), I have used a positive training database of 6,713 cropped 36x36 faces from the Caltech Web Faces project. I have also generated permutation of these HoG features to be able to detect axis symmetric faces and also to double my training features for positive examples.
- For negative examples I have used a small database of such scenes from Wu et al. and the SUN scene database. I randomly sampled different sub-images of the template size same as the training images(36 x 36) per image. I limited the number of such randomly sampled sub-images to 80 per image.
- I experimented with both UoCTTI and Dalal-Triggs variants provided by the vl_hog API.
2. Train SVM classifier
Look for svm_train.m file in the code directory
- Then I trained a linear SVM model with the above extracted features. Linear models are faster to train and test with, making it appropriate for face detection in real world.
- I have used vl_svmtrain from vlfeat library. Here are the training results:
cell size = 3, template size = 36, lambda = 0.001
Metric | Value |
Accuracy | 1.0 |
True positive rate | 0.390 |
False positive rate | 0.000 |
True negative rate | 0.609 |
False negative rate | 0.000 |
- A look at the seperation between the positive and negative features.
- Visualization of the learned detector's HoG template.
3. Mutli scale detector
Look for run_detector.m file in the code directory
- In this final step, first I 'm generating multiple scaled images of the given original test image.
- Then for each scaled image, I 'm calculating a HoG descriptor. I 'm cropping all sub HoG descriptors of template size stepping with cell size in this HoG descriptor. Then for each sub HoG descriptor I get a confidence value from SVM parameters and then classify it as positive or negative on the basis of a tuned threshold.
- I have used dynamic scaling which generates as many scaled images as the maximum of an upper limit that I have placed or downscaled size limited by template size.
- Here are different results based on varying parameters:
template size = 36
Cell size | Lambda | Recursive scale factor | Recursion max depth | Threshold | Avg precision(%) |
3 | 0.001 | 0.95 | 40 | 0.55 | 94.0 |
3 | 0.001 | 0.95 | 30 | 0.55 | 93.8 |
3 | 0.001 | 0.92 | 30 | 0.55 | 93.5 |
6 | 0.000001 | 0.95 | 40 | 0.60 | 87.4 |
6 | 0.000001 | 0.90 | 40 | 0.60 | 87.0 |
- Here are the best average precision vs recall, recall vs false positives obtained:
- Visualization of true postive, ground truth, false positive on some of the test images:
Extra Credits
1.
Experiment with HoG variants
Till now I have been using UoCTTI variant of HoG descriptor in vl_hog by default. I experimented with how well the other variant DalalTriggs would do. As can be seen from the comparison it performed a little worse than the default one.
template size = 36, lamdba = 0.000001, recursive scale factor = 0.95, ecursion max depth = 40
Variant | Threshold | Avg precision(%) |
DalalTriggs | 0.85 | 78.8 |
UoCTTI | 0.60 | 87.4 |
2.
Mining hard negatives
Look for get_hard_negative_features.m file in the code directory
I experimented with mining hard negatives. I ran my detector on the negative training examples again and found out the hard negative examples. Now I used the previous ones and these examples as the augmented negative examples for my SVM and obtained new model parameters. As expected the results didn't really improve much, but I observed a decrease in the amount of false positives.
Budget = 10000, template size = 36, recursive scale factor = 0.95, ecursion max depth = 40
Hard-Mining | Lambda | Threshold | Avg precision(%) |
Yes | 0.00001 | 0.00 | 89.2 |
No | 0.001 | 0.55 | 94.0 |
3.
Different model than linear - Cascaded classifiers
Look for run_detector_cascaded.m file in the code directory
Here I 'm trying to model a cascade of two linear SVMs. This effectively makes it a non-linear SVM. First I train on positive and random negative features. Then I train a seperate model on the previous positive features but only the hard negative features found through hard mining. Then the detector has a tuned threshold for each SVM. Based on the results, I observed that the false positives have been eliminated substantially, keeping average precision almost same as the normal pipeline.
template size = 36, recursive scale factor = 0.95, ecursion max depth = 40
Classifier | Lambda | Threshold | Avg precision(%) |
Cascaded SVMs(Non-linear) | 0.001, 0.0001 | 0.0, -0.70 | 93.5 |
SVM(Linear) | 0.001 | 0.55 | 94.0 |
4.
Hard mined detector on the bonus scenes
Here I have run my hard mine detector on the easy and hard iamges of the same scene.