This project's goal is to use implement a Dalal-Triggs style face detector. The sample faces used for training are from the Caltech Web Face Project and the sample non-faces are from the We et al. and the Sun scene database. The testing images are provided by CMU.MIT face detection test set. For the project, I implemented the following sections.
Get positive features
Get positive features is meant to convert a database of faces into histograms of gradients. The input faces are all in greyscale and of size 36x36. I performed the following steps to accomplish this.
- Load each image in a file directory
- For each image, use VLFeat's vl_hog method to convert the image into a histogram of gradients. VLFeat's vl_hog method takes as input an image and cell size. I used a cell size of 6 when implementing this project.
- The histograms vl_hog returns histograms with 31 bins each.
- Store all of the generated histograms in a matrix and return it.
Get random negative features
Get random negative features takes patches from non-face images at different scales and converts them to histograms of gradients of the same size as in "get_positive_features". This required performing the following steps.
- Load a random image in the directory and convert it to greyscale.
- For scales 1, .8, and .64 of the image, convert the image into a histogram of gradients with VLFeat's vl_hog method using the same cell size of 6.
- Slide through histograms taking non-overlapping patches so that the patch of histograms matches the size of the the histograms obtained from the templates in "get_positive_features".
- Count the number of obtained histogram patches so that there are exactly "num_samples" patches cut out. "num_samples" is an input to the "get_random_negative_features" function
- Store all of the generated histograms in a matrix and return it.
SVM classification
This step trains a model to decide the confidence of image patches being a face. It uses a support vector machine to accomplish this. I perform the following steps to accomplish this.
- Use the results from "Get_positive_features" and "get_random_negative_features" to create the training data.
- select a lambda value for the SVM classification. I found that .0001 was giving me decent results and use that for the implementation
- Use VLFeat's vl_svmtrain to create the hyperplane model that will be used to evaluate images.
Run detector
- Load each image to search for faces in and convert it to greyscale
- Convert multiple scales of the image into cells of histograms of gradients using vl_feat's vl_hog method. I used the same cell size of 6 as before. The scales I used are 1, .8, .64, .512, .4096, .32768, .262144, .2097152.
- Slide through histograms taking overlapping patches so that the patch of histograms matches the size of the the histograms obtained from the templates in "get_positive_features".
- Use the SVM model to compute a confidence score. This is shown below.
score = w'*hogPatch(:) + b;
- If the score is above some threshold, store the bounding box, the confidence, and the image into matrices. I tried several thresholds, and obtained a good balance between precision and accuracy at a threshold of .5.
- Run non-maximum suppression on all the possible matches of faces on each image. The non-maximum suppression will be run on the results of all the scales of a single image at the same time.
- Return the bounding boxes and confidences that have passed through non-maximum suppression.
Results
In the end, I used the following parameter to obtain my results.
- Cell Size: 6
- Lambda: .0001
- Confidence threshold: .5
- Scales in run_detection: 1, .8, .64, .512, .4096, .32768, .262144, .2097152. These are the first 8 powers of .8
- Scales in get_random_negative_features: 1, .8, .64. These are the first 3 powers of .8
I obtained the following results.
Face template HoG visualization for the starter code. This is completely random, but it should actually look like a face once you train a reasonable classifier.
|
Precision Recall curve for the starter code.
|
|
|
Example of detection on the test set from the starter code.
|
|
Example of detection on the Class Test example
|
|