Project 5 / Face Detection with a Sliding Window

Example of a face detection.

The aim in this project is to detect faces in various scenes using sliding window. We build a histogram of gradients (HoG) for the entire image, and then classify these using trained classifiers. The HOG (Histogram of Gradient) is used as representation of feature template,nd these features are used to train a linear Support Vector Machine to locate faces in test images.


The major steps involved can be seen as follow:

  1. Extract HOG from positive training data
  2. Extract HOG from negative training data
  3. Train a Linear SVM classifier
  4. Run a sliding window detector over test images for face detection

The faces are 36*36 pixels, the hog cell size is set as 6 (although various values are tried - say 4,3). Number of orientations were changed and set to a value of 9, however, it did not change the precision value by a great deal. The Lambda value is set throughout to be 0.0001

Extract HOG from positive training images

HOG features are extracted from the positive image set and negative image set. The size is set at 36 and the cell size is set to 6 for faster compute. However value of 4 or 3 gives higher precision but takes more compute time. The HoG is calculated by using the vl_hog method. This vector is then reshaped, flattened and returned. Training for negative image set is also similar to one for positive, however with one change. Here, hard negative mining is also done (check graduate credit section).


image_files = dir( fullfile( train_path_pos, '*.jpg') ); %Caltech Faces stored as .jpg
num_images = length(image_files);
size = (feature_params.template_size / feature_params.hog_cell_size)^2 * 31;
features_pos = zeros(0, size);

    for i = 1:num_images
        im = imread(fullfile(train_path_pos, image_files(i).name));
        im = im2single(im);
        hog_f = vl_hog(im, feature_params.hog_cell_size);
        hog_f = reshape(hog_f,1, size);
        features_pos(i, :) = hog_f;
    end

end

Stage 2: Training Linear SVM classifier

Setting lambda value to be 0.0001, a vl_svmtrain is called with the input and output label svm data. +1 represented the positive features and -1 represented the negative features. The average precision value changed by a very minor factor where the amount of negative training samples were increased.


%example code
svm_data = [features_pos; features_neg];
svm_labels = [ones(size(features_pos,1),1); -ones(size(features_neg,1),1)];
lambda = 0.0001;
[w, b] = vl_svmtrain(svm_data', svm_labels, lambda); 

Sliding Window Detection

The major part of this project is setting the run detector to test images. A sliding window method is used here. A window is moved over the HoG cell that is obtained and a confidence score is obtained. It is considered if the confidence score obtained is greater than 0.75 (threshold). For positive results, the bounding box is obtained and added to positive detections. After the sliding window finishes running throughout the image, non-maximum suppression is performed, to avoid duplicate detection at multiple levels.

Results

HoG Face Templace

Precision Recall curve for cell size 3

Sample detections

Graduate Credit

For graduate credit, I did hard mining for negative set. The step size considered is 6

Hard Mining

Hard negative mining is done to improve the classifier using more negative examples. To do this, the face detector is run on a set of non-face negative training data. If we find a "face" then we add it negative data set. I performed with and without non-max suppression for this case. The SVM classifier model is again trained with this new data set obtained.

HOG

Precision Recall Curve

The amount of "red" reduces here

Performing hard negative mining reduces the number of false positives that are obtained and hence reduces the number of RED boxes in the images