Project 5 / Face Detection with a Sliding Window
Running the detector on our class photo
In this project, I use the sliding window model for face detection. More specificaly, I implemented the sliding window detector by Dalal and Triggs 2005.
My implementation cnsists of the following parts:
Part 1: Get the features
I used the Histogram of Gradient features to detect faces.
The features are the amount of gradients that fit a specific angle range bin.
I used vl_hog package because the HoG extracter I wrote in a previous project is inefficient and slow. There are positive and negative features to be extracted.
I
- Positive features: Using the positive examples as training set, I just extracted the HoG feature from each image and resized the feature to 1 dimensional array,
- Negative features: Using the negative examples as training set, for each image, I would decide the number of patches to extract the features first. Then, select random patches (whose size was defined by template and cell size) at different scales and extract features. I used 10000 negative features.
Part 2: Classification
I wrote a simple linear SVM and trained it on the training data. Then, I tested it on the training data to check itf its working correctly. It got an almost perfect accuracy as expected.
csize = 6
Initial classifier performance on train data:
accuracy: 1.000
true positive rate: 0.398
false positive rate: 0.000
true negative rate: 0.602
false negative rate: 0.000
vl_hog: descriptor: [6 x 6 x 31]
vl_hog: glyph image: [126 x 126]
vl_hog: number of orientations: 9
vl_hog: variant: UOCTTI
I used Lambda = 0.0001 for this.
At this point, the template started looking like this:
If you squint, you can make out a human face shape.
Part 3: Sliding Window
For each image, I would start at the smallest scale the image would allow and iterate to the whole image (scale = 1) in increments of 0.05. During each run, I slid a window of the size template size and extracted HoG features. Then, using the SVM, I calculated the score of the feature and discarded any that did not meet a predefined threshold. The accepted feature would go through non maximal suppresion.
For the MIT Caltech set, the results were checked against the ground truth correspondence.
Experiments and results
Initially, for the starter code, the results were
Template with a uniform gradient distribution
Precision Recall curve for the starter code.
Argentina team
After implementing the whole pipeline, I started playing with the threshold in step 3. Some of the results are:
The threshold values are:
0.0 |
0.5 |
0.70 |
0.75 |
0.90 |
1.2 |
The corresponding precision recall curves are:
After some experimenting, I setteled on 0.75 because I felt it was a reasonable trade off of precision at higher recalls.
The results at this value are:
Template
Precision Recall curve for 0.75 threshold. The precision is 0.827
Argentina team
As you can see, this is clearly better than random results
Some other examples:
Da Vinci Man
My detector gets confused by text and gives out false positives.
Class photo easy:
Class photo difficult:
As you can see, there are plenty of false positives in both.
Other experiments
I also played around with mirror images for positive features. The reasoning is that since the human face is symmetric, training the classifier with horizontally flipped images should give better results.
It is implemented in the get_positive_features_mirror.m of my code.
I got a small boost to 0.84 precision.