The goal of this project to implement a face detection algorithm similar to the sliding window detector proposed in Dalal and Triggs 2005. It uses a SIFT-like Histogram of Gradients (HoG) feature representation. The HoG template contains cells of 31 dimensions, with histograms counting number of pixels with graidents in 9 directions. Then the template is compared to small windows (subsections) of images at various scales. Then non-max supression is applied to handle overlapping boxes before outputing the face detections.
The classifier is a linear SVM (using vl_svmtrain()) that uses positive features from the Caltech 36x36 Caltech Web Faces project and negative features (from images with no faces). For each image in the test set, scaled images are created. These are converted to a HoG representation, and then each portion of the image is evaluated with the classifier. The portions with a high-enough confidence are used.
HOG Cell Size | 6 | 4 | 3 |
---|---|---|---|
Average Precision (%) | 82.9 | 88.9 | 90.4 |
Accuracy | 0.998 | 0.999 | 1.000 |
True Positive rate | 0.397 | 0.397 | 0.398 |
False Positive rate | 0.000 | 0.001 | 0.000 |
True Negative rate | 0.601 | 0.601 | 0.602 |
False Negative rate | 0.002 | 0.000 | 0.000 |
HOG Image | |||
Average Precision | |||
Recall vs False Positives | |||
Classifier | |||
Sample Result |
Smaller HoG Cell Sizes semmed to improve performance. I used a lambda value of 0.0001 for the SVM classifier, and a threshold of 0.3 for the confidence values. We can see that smaller HoG cells result in more face-like HoG templates.
I flipped each image in the positive examples database to produce twice the amount of positive examples (6,713 x 2 = 13,426). This resulted in a more symetric template and increased precision. Here are the comparisons for HoG cells Size = 4.
Configuration | Original | Doubled |
---|---|---|
HoG Template | ||
Classifier | ||
Precision | 88.9 | 91.8 |
Here are some results with HoG Cell Size = 3 and double the number of positive training examples.