Project 5 / Face Detection with a Sliding Window

In this project, the goal was to implement a Dalal-Triggs sliding-window face detector.

  1. Get Positive Features
  2. Get Negative Features
  3. Classifier Training
  4. Detector Implementation

Get Positive Features

36x36 images were read one-by-one from the training set and their Histogram of Gradients (HOG) computed using vl_hog. The HOG descriptors of the image was then vectorized into a 1xD and placed into an NxD matrix of HOG descriptors for the training set. (Where N is the number of images in the training set, and D is the dimensionality, equal to 6x6x31 = 1116.

Get Negative Features

This process was similar to obtaining the positive features. The difference was that for the negative features, each image was sampled at random and each sample had its HOG computed. The HOGs from the sample were used to populate negative features descriptor matrix. I used an average number of samples per image.

Here are the details of computing the HOG features for the samples:


    %compute HOG for each sample per image
    for j = 1 : SPI
        %random x
        sx = ceil(rand() * (imx - template_size));
        %sample heigh
        sy = ceil(rand() * (imy - template_size));
        neg_sample = img(sy : sy + template_size - 1, sx : sx + template_size - 1);
        hog = vl_hog(neg_sample, feature_params.hog_cell_size);
        
        k = ((i - 1) * SPI) + j;
        features_neg(k,:) = reshape(hog, 1, D);
    end      

Classifier Training

To train the classifier I applied vl_svmtrain to the newly found positive and negative feature sets. I started with a lambda of 0.001, then changed to 0.0001, with increased my accuracy from around 0.9997 to 0.9998 or higher. To create the data for the trainer, I combined and transposed the feature sets. Arrays of labels for the examples of positive and negative images were made of arrays of +1 and -1, respectively.

Results:

Here's what my results looked like from the initial classification:

 
Initial classifier performance on train data:
  accuracy:   1.000
  true  positive rate: 0.398
  false positive rate: 0.000
  true  negative rate: 0.602
  false negative rate: 0.000
vl_hog: descriptor: [6 x 6 x 31]
vl_hog: glyph image: [126 x 126]
vl_hog: number of orientations: 9
vl_hog: variant: UOCTTI
Detecting faces in Argentina.jpg
 non-max suppression: 619 detections to 70 final bounding boxes
 

Detector Implementation

To implement the sliding window detector, I took each normalized, grayscale image and scaled it. Then I computed the HOG features of the scaled image using vl_hog. Next I used the template dimensions to create a window equal to the template size. The number of cells per image was the number of hog cells (6). The HOG image was then computed for each window as it traced through the image, and similar to the get_features functions, it created a feature array of the computed (and vectorized) HOG descriptors. Once the descriptors were computed, their confidence was scored using the w and b values trained with my classifier. The scores for the features were then compared to the threshold and those that were above that level were chosen to compute the bounding boxes.

Here is a code snippet of the sliding window algorithm.

 
%sliding window
        for n = 1 : winx
            for m = 1 : winy
                %calculate the index of the feature
                k = (winy * (n-1) + m);
                %crop the image around the feature (for bounding box)
                winfeat = hog(m : (m + num_hog_cells - 1), n : (n + num_hog_cells - 1), :);
                %reshape to 1x(dimensionality of HOG image) add to descriptor results
                window(k, :) = reshape(winfeat, 1, (num_hog_cells^2*31));
            end
        end
		
		

Confidence scoring:

confidence_scores = window * w + b; best = find(confidence_scores > threshold);

Results

Using a scale from 1 to 0.1, with an extra size 0.05, with a step size of 0.1, and a threshold of -0.5, I was able to get an AP of about 0.859. I did receive an error during this run on one of the images - for using too small of a scale. I fixed this by removing the 0.05 scale from my list. These were the classifcation results:


Initial classifier performance on train data:
  accuracy:   0.999
  true  positive rate: 0.398
  false positive rate: 0.000
  true  negative rate: 0.602
  false negative rate: 0.001
After removing 0.05 from the scale and raising the threshold to 0.5, my results changed to AP = 0.847, with the following classification results:

Initial classifier performance on train data:
  accuracy:   0.999
  true  positive rate: 0.398
  false positive rate: 0.000
  true  negative rate: 0.601
  false negative rate: 0.001 
I also tried taking out scales at random, using scale = [1, 0.9, 0.6, 0.5, 0.4, 0.1]; threshold = 0.5; and the fewer number of scales dropped the average precision to 0.828 and increased the false positive rate to 0, and had these results:

Initial classifier performance on train data:
  accuracy:   0.997
  true  positive rate: 0.398
  false positive rate: 0.003
  true  negative rate: 0.599
  false negative rate: 0.000