Project 5: Face Detection with a Sliding Window

Part 1: Get Positive Features

The first part of this project was loading cropped positive training examples (faces) and converting them to histogram of gradient features that we can then use later in the project. In order to accomplish this I iterated through each training image and for each image I computed the HoG using vl_hog and then reshaped this to much the dimension of our features. The code to do this is of course provided, and this was the simplest part of the project. Here right away we have an imporant parameter which is the cell size we want for our HoG feature. I will discuss in more depth about this parameter in results but here I found that smaller HoG cells took for longer to run my algorithms but produced better results (which is expected!).


Part 2: Get Random Negative Features

For this portion of the project I sampled random examples from scenes without faces, and made them into HoG features, similarly to the positive examples. To do this I iterated through the images and computed how many samples I needed for a single image in order to reach the specified number of samples. Here we see another important parameters which is the number of negative examples to get. I used 10,000 for debugging and then tried other values. I found that the more examples I grabbed the better my algorithm performed, but the slower it ran. I found 20,000 to be a good number of samples to use to get good results while still not making my pipeline too painfully slow! Again this makes sense as having more negative examples makes it easier for us to use this to rule out false positives and differentiate between negative and positive examples when trying to classify a patch as a face. For each image I computed a random row and column to pull a feature patch. I then used vl_hog on this image patch to get a feature, and then reshaped it to match our feature dimension. Note that this is the same method used for a positive image patch. Again, the code is relatively simple and found in my deliverables.


Part 3: Classifier Training

This portion of the project is where we train a linear classifier. As suggested I used a SVM classifier which returns parameters that we can use to run our classifier in the final step of the project. The way this works is by combing positive and negative features and then creating a corresponding array which identifies which example is positive and which is negative (with a 1 or -1 respectively). The single line to accomplish this is shown below:


[w b] = vl_svmtrain([features_pos; features_neg]', ...
    [ones(size(features_pos, 1), 1); -1*ones(size(features_neg, 1), 1)], 0.0001);

Here we see another important parameter, which is the final entry into vl_svmtrain. This parameter of lambda is crucial for getting good results as it control the bias in our model. I found that a value of 0.0001 did a very good job of avoiding both over and under fitting. Other values I tried didn't improve my performance. The lower a lambda value (closer to 0) the more misclassifications are allowed by our model. By having a smaller lambda we can allow our model to generalize well. With a lambda value too small, however, we allow too many misclassifications and don't fit our information well enough. A few runs of the pipeline with different cell sizes showed me that this lambda did a great job of generalizing.


Part 4: Run Detector

This final part was definitely the most challenging portion of the project and contained the most design decisions. For each image we wished to detect faces in, I ran a classifier at multiple scales to detect faces, and then performed the non-max supression given to get rid of identical detections. I began by iterating through the different scales I wished to use. Through much trial and error I found the scales I wished to use. I made sure to get a full range of values so as to not miss detections I also made sure that none of my scales were too close to each other as adding scales made the process take longer and would often increase the number of false positives. That said, the more scales I added, I typically got higher precision. I rescaled the image based on the specified scale and then used vl_hog on the image. I then extracted all the features using a sliding window and ran the linear classifier from our previous part on the matrix of features. Here we have another very important parameter! This parameter is the threshold for a match. I found that using lower thresholds increased my precision by allowing more matches, but also increased my false positives as I was less confident on some of the matches. I was still able to achieve pretty high precisions (greater than 80 and close to 83% with a 6 cell size) using a threshold of 0.7. Once I found my matches I then took the matches and used them to define boxes on my image (to this I had to readjust them based on my initial scale). Finally, once this was all complete for all my scales, I used the given function to perform non-maximum supression and get rid of overlapping boxes/matches. The code below shows first the scales I used in one version of my code and then the way in which I utilized a threshold to find matches.


for s = [1, 0.9, 0.75, 0.6, 0.5, 0.4, 0.25, 0.1] ...
    % code for each scale

function [matches, temp_confs] = run_lin_class(features, w, b, threshold)
    class_results = features * w + b;
    matches = find(class_results > threshold);
    temp_confs = class_results(matches);
end

Notice that lower thresholds create more matches and thus cause my pipeline to run slower, but this is helpful for achieving a higher precision! To get an idea for how my parameters effected my results... By using a threshold of 0.1 instead of 0.7 my accuracy increased from 88% to 90.6% when using a step size/cell size of 3. By doubling my samples I also increased my accuracy from the low to high 80s. During this time my accuracy dropped from 0.999 (1.000 in some cases) to 0.996 as I had some false positives. Furthermore, by adding more scales I increased my precision from 90% to 92.1%. With more scales I actually was able to maintain an accuracy of 0.999 (the last bit lost through a false negative)!


Examining Results for Various Images and Parameters

I found that the results produced were actually very good overall! Below I will show the precision-recall curves generated by different parameters that I used that still managed to produce good results.

Below we see the differences between using more or fewer negative samples. We also teh differences between a larger or smaller HoG cell size.

identity
HoG Cell Size: 6, Threshold: 0.7, # Scales: 8, # Neg Samples: 10000
sobel
HoG Cell Size: 6, Threshold: 0.7, # Scales: 8, # Neg Samples: 20000
sobel
HoG Cell Size: 3, Threshold: 0.7, # Scales: 8, # Neg Samples: 10000
sobel
HoG Cell Size: 3, Threshold: 0.7, # Scales: 8, # Neg Samples: 20000

Here we see some modest improvements using more samples. However, the improvements aren't necessarily super large, which makes sense as there are a lot of other parameters in play. However, this does make sense as having more training samples should always help us with classification!

Now let's see what happens when we make our threshold much lower!

sobel
HoG Cell Size: 3, Threshold: 0.1, # Scales: 8, # Neg Samples: 20000
sobel
HoG Cell Size: 3, Threshold: 0.1, # Scales: 10, # Neg Samples: 20000

What we see here is that a lower threshold increases our precision. Again this makes sense! That said, we do have more false positives, in this case bringing our accuracy to 0.996 instead of 0.999. This said, for future experimentation with more time I would like to test even more smaller thresholds as this increase in performance was not too large and due to testing the variability of the precision since this project has some randomness components to it. What else do we see? We have a really strong precision with lots and lots of scales! We get all the way to 92.1% precision, and in this case our accuracy was still a high 0.999! That said, this pipeline took quite some time to run. By running at more scales we had many more red boxes but our precision was still stronger as we considered more scales when finding confident matches! I found that I got very high precision when using the following scales: [1, 0.9, 0.83, 0.75, 0.6, 0.5, 0.4, 0.25, 0.1].

Now lets look at the performance on some of the images, including our class photo!

identity
HoG Cell Size: 3, Threshold: 0.1, # Scales: 10, # Neg Samples: 20000
sobel
HoG Cell Size: 3, Threshold: 0.1, # Scales: 10, # Neg Samples: 20000
sobel
HoG Cell Size: 3, Threshold: 0.7, # Scales: 8, # Neg Samples: 10000
sobel
HoG Cell Size: 3, Threshold: 0.7, # Scales: 8, # Neg Samples: 10000

Notice here how many more boxes we see in the second picture when using more scales and a lower threshold! That said, both pipelines have succeeded in detecting all of these ground truth faces. Using a smaller cell size, however, as we have seen, does on the whole give us significantly higher precision over the entire set of images we tested with ground truth! Now lets look at some other images where we don't have ground truth.

sobel
HoG Cell Size: 6, Threshold: 0.7, # Scales: 5, # Neg Samples: 20000
sobel
HoG Cell Size: 6, Threshold: 0.7, # Scales: 5, # Neg Samples: 20000

Here we see that even with a larger cell size, high threshold, and not many scales we did quite well on the class pictures! The reason the scales aren't as important here is that we know how large our image is, so I simply only kept the smaller scales which would scale our image down by a good amount. Interestingly our detector even did a good job of catching faces in the projector! Seeing the detector do such a good job even with the faster settings was very helpful in assessing the strength of the pipeline! Of course we also notice how much more difficult it is to detect faces when people are intentionally trying not to be detected like in the second image! Yet we still find some detections! Below are some more examples of extra images I tested on:

sobel
HoG Cell Size: 6, Threshold: 0.3, # Scales: 10, # Neg Samples: 20000
sobel
HoG Cell Size: 6, Threshold: 0.3, # Scales: 10, # Neg Samples: 20000
sobel
HoG Cell Size: 6, Threshold: 0.3, # Scales: 10, # Neg Samples: 20000

Here we see quite a few more bounding boxes. This is because I used many more scales and a lower threshold!