Project 5 / Face Detection with a Sliding Window

In this project, sliding window model is implemented to detect faces in data set. Handling heterogeneous training and testing data, tranning a linear classifier (a HoG template), and classifying sliding windows at multiple scales are performed in this project. Several key processes of this project can be listed as follows:

  1. Load positive trained examples and convert them to HoG features
  2. Sample negative examples and convert them to HoG features
  3. Train a linear classifier from the positive and negative examples
  4. Run the classifier at multiple scales on the test set

Example Face Detection Result.

Code and Algorithms

Part 1: Load positive trained examples and convert them to HoG features

In this part, the cropped positive trained examples are loaded and converted into HoG features by using vl_hog.

Some code for get_positive_features.m is shown below:


for i=1:num_images
    img=imread(fullfile(train_path_pos,image_files(i).name));
    img=single(img)/255;
    hog=vl_hog(img, feature_params.hog_cell_size);
    hog=reshape(hog,[1,temp_dim]);
    features_pos=[features_pos;hog];
end

Part 2: Sample negative examples and convert them to HoG features

In this part, similar to get_positive_features.m, negative examples from scenes were sampled and converted into HoG features.

Some code for get_random_negative_features.m is shown below:


rand_sample=randsample(size(features_neg,1),num_samples);
features_neg=features_neg(rand_sample',:);

Part 3: Train a linear classifier from the positive and negative examples

In this part, vl_svmtrain was used to get a linear classifier based on training features. The value of lambda is set to be 0.0001 here to increase accuracy.

Some code for trainning classifier is shown below:


lambda=0.0001;
X=[features_pos',features_neg'];
Y=ones(size(features_pos,1)+size(features_neg,1),1);
Y(size(features_pos,1)+1:end)=-1;
[w,b]=vl_svmtrain(X,Y,lambda);

Part 4: Run the classifier at multiple scales on the test sets

The classifier was implemented on the test set. Multiple scales were used here to run the classifier and non-maximum suppression was also conducted here to enhance the performance.

Some code for get_positive_features.m is shown below:


for t=1:length(scale_value)
        resized_img = imresize(img,scale^scale_value(t));
        img_hog = vl_hog(resized_img, feature_params.hog_cell_size);
        for c=1:1:size(img_hog,1) - num_cell + 1;
            for r=1:1:size(img_hog,2) - num_cell + 1
                feature = img_hog(c:c+num_cell-1,r:r+num_cell-1,:);
                score = reshape(feature,1,[])*w+b;
                if score>threshold
                        cur_bboxes = [cur_bboxes; round(((r-1)*feature_params.hog_cell_size+1)/(scale^scale_value(t))), round(((c-1)*feature_params.hog_cell_size+1)/(scale^scale_value(t))),round((r+num_cell-1)*feature_params.hog_cell_size/(scale^scale_value(t))),round((c+num_cell-1)*feature_params.hog_cell_size/(scale^scale_value(t)))];
                        cur_confidences = [cur_confidences; score];
                        cur_image_ids = [cur_image_ids;{test_scenes(i).name}];
                end
            end
        end
    end

Results and Analysis

In order to find the best performance, three important parameters, scale value, number of negative examples, and hog cell size value, were compared here.

First of all, the scale value was compared. Three scale values: 0.7, 0.8, and 0.9 were chosen to conduct the comparison experiment. The hog cell size was set to be 6, number of negative examples was 10000, and step size was 50.

Scale = 0.7

Scale = 0.8

Scale = 0.9

It can be observed here that when the scale value equals to 0.8, the accuracy reaches to maximum 88%. However, every time the accuracy changes slightly.

Secondly, the number of negative examples was compared. Three number of negative examples values: 10000, 20000, and 30000 were chosen to conduct the comparison experiment. The hog cell size was set to be 6, step size was 40 (30 for 30000), and scale value was 0.8.

Number of negative examples = 10000

Number of negative examples = 20000

Number of negative examples = 30000

It can be observed here that when the number of negative examples equals to 20000, the accuracy reaches to maximum: 89.8%. Theoretically, the highest accuracy should be reached when the number of number of negative examples is 30000. The reason for this phenomenon is possibly that when the number of negative examples is 30000, the step size needs to be properly decreased to finish the sampling task. The change of step size may caused this decreased accuracy.

Finally, the hog cell size value was compared. Three number of hog cell size values: 6, 4, and 2 were chosen to conduct the comparison experiment. The number of negative examples was set to be 10000, step size was 40, and scale value was 0.8.

Hog cell size = 6

Hog cell size = 4

Hog cell size = 2

It can be observed here that when the hog cell size equals to 2, the accuracy reaches 92.5%, which is the highest accuracy. Also, the hog template looks very nice when the cell size is 2. However, the running time of hog cell size 2 is extremely long.

Some Extra Works

In this project, the following extra works were implemented:

  1. Showing the face detections on the class photo in the data/extra_test_scenes directory
  2. Implementing hard negative mining
  3. Parameters Tuning for scale value, number of negative examples, and hog cell size value

Showing the face detections on the class photo in the data/extra_test_scenes directory

Class Photo Example 1

Class Photo Example 2

Class Photo Example 3

Implementing hard negative mining

In this part, hard negative mining is implemented. The advantage of implementing har negative mining is that the linear SVM detector can be improved and refined. For each non-face scene image, the faces will be sorted by confidence, and the hard negative only chooses the top results.

The results of implementing hard negative mining are shown below:

Without Hard Negative Mining

With Hard Negative Mining

Without Hard Negative Mining

With Hard Negative Mining

Without Hard Negative Mining

With Hard Negative Mining

For this part, if the accuracy is the only factor we consider, we cannot get the conclusion that there is any improvement for hard negative mining. However, if we observe a specific image pair of an image without hard negative mining and an image with hard negative mining, it is obvious that the number of false positives is tremendously decreased, which enhances the performance.

Parameters Tuning for scale value, number of negative examples, and hog cell size value

In order to find the best performance, three important parameters, scale value, number of negative examples, and hog cell size value, were compared in above section "Results and Analysis".

Best Result

The best result was achieved under circumstance that the hog cell size equals to 2, the number of negative examples was set to be 10000, step size was 40, and scale value was 0.8.

The results for best performance are shown below:

Precision Recall curve for the starter code.

Example of detection on the test set from the starter code.

It can be observed that the above results showed very good performance, the accuracy reached 92.5%. However, since the hog cell size is very small (2), the running time for best result is extremely long.