Project 5 / Face Detection with a Sliding Window

Example of Face Detection with Many False Positives

Face detection is a very common vision problem. Humans care about faces, by default it is what we focus on. There are many simple and real time solutions to this problem. In this project we will implement one such solution called the sliding window. While this method is quick and can detect true faces accurately about 90% of the time, it registers many false positives.

Get Positive/Negative Features

We first need to get feature descriptors for faces and for non-faces. Our training data for faces is set of images 36 pixels by 36 pixels that are cropped to show only a face. For each face image we create a histogram of gradients, using vl_hog, as our feature descriptor. For the non-face dataset we have several hundred images with no faces in them. We take some number of samples, on the order of 10,000, from these non-face pictures. To take a sample, a random 36x36 section of a random image is cropped and a HoG is made for this section.

Train Classifier and Analyze Test Set

Using the positive and negative features, we can that create a list of labels. Each postive feature gets is labeled 1 and each negative feature is labeled -1. Using the features and the labels, we can train a linear SVM quickly using vl_svmtrain.

To detect faces in an image we use a sliding window to analyze each section of the window. We first create a HoG of the image. We then walk over each cell in the HoG and create a 36x36 block of pixels with the current cell in the top left corner. We then use our linear svm to determine if this block of cells contains a face and how confident a detection this is.


      % Get hog features
      imageHog = custom_hog(scaledImage, feature_params.hog_cell_size, '');

      % Walk over all cells
      for y=1:(size(imageHog,1) - hogCellsPerTemplate+1)
        for x=1:(size(imageHog,2) - hogCellsPerTemplate+1)

          % Get cells around current cell and reshape to vector
          hogSegment = reshape(imageHog(y:y+hogCellsPerTemplate-1,x:x+hogCellsPerTemplate-1,:), size(w));

          % Evaluate with classifier
          confidence = dot(w,hogSegment') + b;
        end
      end
      

If the confidence is above a certain threshold, then we have detected a face. To help with accuracy on differing resolution images, we run this sliding window algorithm on multiple scales of the image. This allows us to detect faces with extremely high resolution using our SVM classifier trained from low resolution training images. For each image, I continuously scale it by a factor of 0.9 until the image is too small to extract a 36x36 window. Once all varying scales of the image have been analyzed, we use non-maximal suppresion to combine overlapping detections into just one detection.

HoG Templates

We can visualize our learned classifier using vl_hog. Below are the trained classifiers for 3 different cell sizes. The smaller the cell size the more our classifier resembles a human face.

Cell Size: 3
Cell Size: 4
Cell Size: 6

Results

My results were very good. My highest average accuracy was 0.929. This was achieved by using a cell size of 3 and an SVM lambda of 0.0001 and sampling 20,000 negative features. Below are the resulting graphs for the a cell size of 3 as well as results for cell sizes of 4 and 6. I have also included the results of my own HoG implementation. I used a very simple 8 orientation HoG, like we did in project 2. As you can see the accuracy is much worse than the accuracy achieved from vl_hog, but I was still able to achieve an accuracy of 0.647.

Average Accuracies

Cell Size: 3
Cell Size: 4
Cell Size: 6
Custom HoG, Cell Size: 6

False Positives vs Accuracy

Cell Size: 3
Cell Size: 4
Cell Size: 6
Custom HoG, Cell Size: 6

Various Face Detection Results