Project 5 / Face Detection with a Sliding Window

They key steps in this project were:

  1. Finding hog features from positive and negative examples
  2. Building a classifier over these features
  3. Detecting faces in test images using this classifier via a detector

Hog features from positive and negative examples

Extracting features from the positive images was an easy task which involved only running vl_hog over the images since they were already template sized images. I obtained 6713 positive features. Extracting features from the negative images required a more comprehensive method. The images were not necessarily template sized and were rgb unlike the positive images. So, I first converted these images to grayscale and then randomly chose template size patches from these images to run hog over. The reason why randomly choosing some 20-40 patches from each image was enough was because we had very few positive training examples(size wise) than the negative examples and choosing a lot more negative example would lead to bias. Hence, I chose some 10000 negative features randomly.

Building a classifier over these features

The nest step was building a classifier. I chose svm with a regularization value of 0.0001.

SVM model

Building a detector

I did the following steps in order to build a detector:

  1. Converted the test images to grayscale
  2. Ran vl_hog over each image to obtain x*y*31 sized hog feature space
  3. Selected patches of size (template/cell_size)*(template/cell_size)*31 from this feature space. These are the hog features for this image
  4. Feed these features to svm which would return confidence values corressponding to every feature for this image. Choose features and coressponding patches which are above a threshold confidence level. These are potential faces in this image
  5. Run non max supression over these patches and confidence values to retain only the local maxima values. These are the faces detected in the image.

I ran it intially with only one scale and obtained the following results:

Results with a single scale

Following this, I ran my detector my multiple scales: [1,0.8,0.8^2,0.8^3,0.8^4,0.8^5,0.8^7,0.8^9]. This led to a very large increase in the average precision values. Following are the results obtined with different thresholds in the range of 0.5-0.9.

Results with multiple scales

Here are some of the test images. I noticed that even though lowering threshold seemed to be increasing average precision, there were a lot of red boxes in the images so obtained.

Following this, I tried to vary hog cell sizes to 3 and 4 and obtained the following improved results.

Results with different hog cell size 3

Results with different hog cell size 4

Graduate credits: Hard negative mining

I implemented hard negative mining as follows:

  1. Train svm on positive and negative image features
  2. Run detector using this svm on the negtaive images training set. There will be a few patches which will be detected as faces from these images by the detector.
  3. Use the features from these patches to retrain the svm in addition to original features.

This is more usueful in a setting where there are fewer negative examples to train the model on. But in our case, since there were already plenty of training examples available, not much difference was noted in the average precision values as a result of implementing hard negative mining.

Results with hard negative mining

I noticed that the detections were much cleaner, i.e. fewer false positives in the image, but in order to achieve the same precision as before, I had to vary threshold and this led to more red boxes in the image as compared to the initial threshold

Here is an example of a test image where the number of false detections went down drastically with hard negative mining.

Results from class test scenes

With detector of template size 36 and hog cell size=6