Project 5 / Face Detection with a Sliding Window

The purpose of this project was to create a face detector using a sliding window model first implemented by Dalal and Triggs (2005). This face detector should work at multiple scales, and be able to identify both human and illustrated faces.

Positive and Negative Feature Sampling

All positive features were of size 36 x 36 and contained a single human face. The HoG (histogram of gradients) template was found by providing each image and cell size to a provided Matlab function (vl_hog). 10000 negative features were selected using windows of size 36 x 36 pixels taken from samples images at 100% scale, 50% scale, and 25% scale if the image was large enough to support the resizing(the window size stayed the same for a zoomed in effect). These scales were used to provide different types of negative images. A Support Vector Machine (SVM) was trained using the positive and negative examples. I used 0.0001 for my lambda value since I had success with that value in previous projects. Pictured below is the learned template from the training set at each HoG cell size.

Learned Template at HoG cell size 6, 4, and 3 respectively.

Detecting Faces

To detect the faces in the testing images, I first found the HoG for each test image (without reducing its size), and moved through each image at a variety of scales. For the cell size 6 trial, I used a threshold of 1.1, and scales of 0.23, 0.36, 0.49, 0.62, 0.75, 0.88, and 1.01. Please see below for an example of how these scales were calculated and used. I found that a threshold value of between 0.9 and 1.1 worked best; it varied slightly based on the negative training data and HoG cell size. Once I had found the corresponding HoG gradient for a scaled frame of the image, I used the weight vector w and offset b parameters from the SVM to score the frame. If the frame scored a higher confidence than the threshold, I included it in the bounding boxes to be assessed and updated using non-maximum suppression.

How scales were implemented in run_detector.m.

    h_size = feature_params.template_size / feature_params.hog_cell_size;
    resize_factor = 0.1; % 0.1 for 0.6 and 0.3, 0 for 0.4
    resize_count = 7; % 7
    for f_i = 1:resize_count

        resize_factor = resize_factor + 0.13; % step size 4: 0.12; step size 6: 0.13, step size 3: 0.14
        curr_img = imresize(img, resize_factor);

        hog = vl_hog(curr_img, feature_params.hog_cell_size);
        [h_height,h_width,~] = size(hog);

        for column = 1:h_height - h_size
            for row = 1:h_width - h_size

                hog_feature = hog(column:column + h_size - 1, row:row + h_size - 1, :);
                hog_feature_vector = hog_feature(:);

                score = w' * hog_feature_vector + b;
                ...

Results

I ran this algorithm at hog cell sizes of 6, 4, and 3. The precision, Viola Jones figure, and some example images for each size are shown below. The average precisions are: cell size 6 - 0.834, cell size 4 - 0.885, cell size 3 - 0.891.

Precision at HoG cell size 6, 4, and 3 respectively.

Viola Jones figure at HoG cell size 6, 4, and 3 respectively.

Viola Jones figure at HoG cell size 6, 4, and 3 respectively. The image with cell size 6 was run with a threshold of 1.1, and the other two were run at a threshold of 1. Although they do appear less "good" than the image with larger cell size, the higher recall rate contributed to the overall processing having a higher precision.


Hard Negative Mining

I completed the hard negative mining extra credit portion of the project. I ran the negative examples through my run_detector code, and added any false positives to the negative examples again (to increase their relative weight). I then re-ran the SVM training, and used the augmented w and b values. I did not find this to be particularly helpful. If I repeated this process more than once, it resulted in the SVM parameters being overfitted to the data. The accuracy was very high, but the recall rate was lower. Here are some results from the hard negative mined samples, with cell size 4.