Project 5 / Face Detection with a Sliding Window

Positive features extraction

The training dataset contains 36x36 images of faces. The default template size is 36 and the default hog cell size is 6. We use vl_hog to get the HOG representation of each image and then some simple matrix operations to obtain our feature vector.

Negative features extraction

We are given a small dataset of non face images. To obtain around 10,000 negative features, I extracted about 40 random 36x36 patches from each image at 4 different scales. We convert these patches to HOG features for the next step in the pipeline.

Classifier Training

We already got both positive and negative features. So, we use vl_svmtrain to get our classifier parameters. I set the LAMBDA value to 0.0001.

Detector

This is the major part of the project. For each test image, we first build its HOG representation so that we can use the template generated by our classifier to detect faces. We go over each test case extracting 36x36 patches to compare to our template keeping only those with confidence above set threshold. I set the threshold to -0.25 to increase the recall, however this means that results would have a lot of false positives. Here, my step size was 6 which is same as hog cell size. We also iteratively do this for different scales. I set the step size for this to 0.9. Finally, we use non-max supression on the set of bounding boxes returned initially to get final detections for each test case. Using the above parameters, I was able to get average precision of 0.885

Results

Here I show the hog template(left) returned by the classifier and precision-recall curve(right) for my detector before extra credit stuff.

Graduate credit

I implemented the hard negative mining from Dalal and Triggs for this. Once we have a preliminary classifier by following earlier procedure, we do an additional step where we exhaustively search non face images at different scales for patches that have high confidence. The process is similar to what we use in detector step of the pipeline. I used a threshold of -0.25, step sizes of 6 for space and 0.5 for scale. This gave me about 4000 hard negative features. Now, I use both the hard features and normal negative features from before to retrain my classifier. However, doing this only gave me a small increase in accuracy as explained on project page. The big difference I observed was with extra_test_scenes where I had much lesser number of false positives. I show the results below :