The following writeup is organized as follows:
I. Extract features from positive examples
II. Extract features from negative examples
III. Classifier Training
IV. Run Face Detector
V. Hard Negative mining
VI. Interesting Results
For positive examples, inorder to extract the features, I have just iterated over all the images and extracted the hog feature vectors using the vl_hog function and a particular cell size.
One way to augment this dataset is to flip the positive examples left to right and it is still a face. So, we could also use the flipped images as positive examples and extract features from them.
I have also shown few visualization of the hog features of a few positive examples below.
For negative examples, inorder to extract the features, I randomly sample few patches of the image from every example and extract a hog feature from the image. I make sure to keep the count of total number of negative features obtained from the patches to be num_samples.
One way to augment this dataset is to flip the positive examples left to right and it is still a face. So, we could also use the flipped images as positive examples and extract features from them.
I have also experimented with increasing the number of negative samples which is explained later.
Below, I have show visualization of hog features of a few negative examples below.
I have used the inbuilt function in vlfeat for training a linear SVM classifier. I have also tried using various values of lambda. Results are described later in the report.
For building a detector on the test images, I have used the following algo:
1. Firstly, read and convert and image to grayscale.
2. Then, for each scale: obtain the hog parameters of the image and loop through the space of the hog parameters with a sliding window such that the features size matches to that of the template.
3. Check, if the patch in the sliding window has a confidence above the threshold. If yes, map the cooresponding co-ordinates of the patch in the image using the co-ordinates in the hog space.
4. Repeat this for multiple scales and then run non-max suppression for all of them together and obtain the final bounding boxes of the faces in that particular image.
5. Repeat the above steps for all the images and obtain the final bounding boxes of the faces in all the images.
threshold value | -1 | -0.5 | 0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 |
lambda value | ||||||||
0.00001 | 0.85 | 0.84 | 0.84 | 0.83 | 0.84 | 0.83 | 0.83 | 0.83 |
0.0001 | 0.84 | 0.85 | 0.86 | 0.83 | 0.83 | 0.83 | 0.85 | 0.81 |
0.001 | 0.84 | 0.85 | 0.85 | 0.86 | 0.81 | 0.87 | 0.82 | 0.85 |
0.01 | 0.82 | 0.83 | 0.82 | 0.81 | 0.78 | 0.81 | 0.82 | 0.80 |
0.1 | 0.76 | 0.75 | 0.75 | 0.75 | 0.75 | 0.73 | 0.72 | 0.70 |
scale factor | 0.9 | 0.8 | 0.7 | 0.6 | 0.5 |
81.5 | 85.5 | 82.9 | 81.3 | 73.1 |
template obtained | average precision curve |
For hard negative mining: build a detector on the negative examples. I have always used the hog cell size to be 6. I have used the following algo:
Variant -1
1. Firstly, read and convert each image to grayscale.
2. Then, for each scale: obtain the hog parameters of the image and loop through the space of the hog parameters with a sliding window such that the features size matches to that of the template.
3. Check, if the patch in the sliding window has a confidence above the threshold. If yes, store this feature in a vector.
4. Repeat this for multiple scales and store all the corresponding feature vectors that are detected as face: i.e. confidence above threshold.
5. Obtain all such feature vectors and augment them as the negative features in the data and retrain a linear SVM classifier.
(or) Variant - 2
Another way to do this: obtain those image ids on which the current classifier is making mistakes and densely sample patches from the images and use them as negative examples and augment the training dataset.
I found the performance of them to be quite similar, although the second was performing slightly better and they slightly improved the performance to the above method without the hard negative mining.
template obtained | average precision curve |
I have also tried the above algorithm including hard negative mining to detect faces on the extra test scenes and following are the results.
Parameters used:
threshold = 0.6, lambda = 0.001 when scale factor = 0.8, num_negative_samples=10,000 and then performing hard negative mining using variant-1
All the work that has been implemented for the project has been presented and discussed above. Feel free to contact me for further queries.
Contact details:
Murali Raghu Babu Balusu
GaTech id: 903241955
Email: b.murali@gatech.edu
Phone: (470)-338-1473
Thank you!