Project 5 / Face Detection with a Sliding Window

The output on the friends image after the algorithm implementation. Even the algorithm recognizes them perfectly :)

The algorithm contains the following implementations:

  1. get_positive_features.m
  2. get_random_negative_features.m
  3. classifier training
  4. run_detector.m
  5. hog_descriptor.m
  6. get_hard_negative_features.m

The following description explains the implementation of each of the algorithms. For running the hog_descriptor and the hard_negative_mining (extra credit), I have mentioned the lines to be commented and uncommented(in different files) in the README.txt.

Getting positive features

In this implementation I compute the hog features of every image in the training set. I have used the template_size as 36 and cell_size as 3 for computing the hog features.

In my default implementation I have used te vl_hog function to make computations, but I have mentioned the instructions in README.txt to change this to the hog function that I have implemented.

The code in this file :


image_files = dir( fullfile( train_path_pos, '*.jpg') );
num_images = length(image_files);
features_pos = zeros(num_images,(feature_params.template_size / feature_params.hog_cell_size)^2 * 31);
for i = 1:num_images
    disp(i)
    image = single(imread(strcat(train_path_pos,'\',image_files(i).name)));
    hog_image = vl_hog(image,feature_params.hog_cell_size);
    features_pos(i,:) = reshape(hog_image,[],1)';
end

Note : Since it is fewer number of lines, I thought it would be convenient for you to have the code here itself.

Getting random negative features

In this implementation as seen below, I choose random indices to get a patch from the image of template_size x template_size. Once I have this patch I find the hog features of this patch. For every image I find the number of (hog features = total samples to be found / number of images). This doesn't exactly give the total hog features equal to the required samples, because of the ceil function used.

In my default implementation I have used te vl_hog function to make computations, but I have mentioned the instructions in README.txt to change this to the hog function that I have implemented.

The code in this file :


image_files = dir( fullfile( non_face_scn_path, '*.jpg' ));
num_images = length(image_files);
num_samples_per_image = ceil(num_samples/num_images);
features_neg = zeros(num_images * num_samples_per_image,(feature_params.template_size / feature_params.hog_cell_size)^2 * 31);
for i = 1:num_images
    disp(i)
    image = imread(strcat(non_face_scn_path,'\',image_files(i).name));
    image = single(rgb2gray(image));
    [m,n] = size(image);
    for j = 1:num_samples_per_image
        index1 = randi(m - feature_params.template_size + 1);
        index2 = randi(n - feature_params.template_size + 1);
        image_patch = image(index1 : index1 + feature_params.template_size - 1, index2 : index2 + feature_params.template_size - 1);
        hog_image = vl_hog(image_patch,feature_params.hog_cell_size);
        features_neg((i-1)*num_samples_per_image + j,:) = reshape(hog_image,[],1)';
    end
end

Classifier Training

This section of the code is mentioned in the proj5.m file itself. Here after computing the positive and the negative features as mentioned above, I use these features and vertically concatenate them and also assign the values of +1 for faces and -1 for non faces as labels.

Then I call the function vl_svmtrain and find the values of w and b. I have used the value of lambda as 0.0001 since this value gave me the optimal results.

The code for this section :


X = [features_pos; features_neg];
Y = [ones(size(features_pos,1),1); -ones(size(features_neg,1),1)];
[w,b] = vl_svmtrain(X',Y,0.0001);

Face template HoG visualization.

As you can see from the above image, it looks like a face at the center.

Run Detector

In this implementation, for every image I rescale and find boxes 'steps' number of times. In my implementation, I have used the steps as 40. For every step :

  1. I find the hog features of the image using the cell size as 3.
  2. In this hog_image, I find a patch of size feature_params.template_size/feature_params.hog_cell_size x feature_params.template_size/feature_params.hog_cell_size
  3. On this patch I apply w' * Patch + b to get a confidence value. If this confidence value is greater than a certain threshold, I compute the (X,Y) coordinates of this patch.
  4. I then multiply these coordinates according to the step (resized image factor)
  5. I then store the (X,Y) coordinates, confidences and image ids of all such boxes.
  6. Resize the current image by the rescaling factor and go to step 1.

I do this steps number of times.

Paramters used:

  1. hog cell size : 3
  2. number of steps : 40
  3. scaling factor : 0.85
  4. confidence threshold : 0.10
  5. lambda for svm : 0.0001

In my default implementation I have used the vl_hog function to make computations, but I have mentioned the instructions in README.txt to change this to the hog function that I have implemented.

Result

Cell Size:
HoG cell Size Average Precision
3 0.921
4 0.905
6 0.829

For HoG cell size 3:

confusion Threshold Average Precision
-0.1 0.920
0.0 0.920
0.1 0.921
0.25 0.915
0.45 0.906
0.50 0.909
Rescale Factor, with HoG cell size 3, confusion threshold 0.1:
Rescale Factor Average Precision
0.9 0.919
0.85 0.921
0.75 0.918
With above paramters and varying lambda:
Lambda Average Precision
0.0001 0.921
0.0005 0.917
0.001 0.918
0.00001 0.911

I get an average precision of 0.921 after using the above paramters on the test images. The precision recall curve and some of the images that I get after this implementation are:

Precision Recall curve.

Detection on an image with several faces.

Detection on an image with moderately many faces.

Detection on an image with few faces.

Detection on an image with a single face.

The results of implementing the above algorithm on class images:

Extra Credit

Mining Hard Negatives

For implementing this, once I get the positive features and negative features as mentioned above, I run the run_detector code on the non_face dataset and find the boxes given by this algorithm. I have a separate file for this detector called run_detector_hard, which does not implement the maximum suppression and returns boxes without it.

In the get_hard_negative_features I take the box obtained above and then take a patch of template_size x template_size from the center of the box to accomodate the scaling.

Then using this patch use the vl_hog to find HoG features of this patch and add it to the negative features.

Once I have these negative features, I add them to the previous dataset as negative labels and then again call the run_detetctor on the test dataset.

I implemented this with and without suppression. I got slightly better precision and fewer false positives without suprresion.


	X = [features_pos; features_neg];
	Y = [ones(size(features_pos,1),1); -ones(size(features_neg,1),1)];
	[w,b] = vl_svmtrain(X',Y,0.0001);

	[bboxes_nf, confidences_nf, image_ids_nf] = run_detector_hard(non_face_scn_path, w, b, feature_params);
	features_neg = get_hard_negative_features(non_face_scn_path, bboxes_nf, image_ids_nf, feature_params);
	X = [X; features_neg];
	Y = [Y; -ones(size(features_neg,1),1)];
	[w,b] = vl_svmtrain(X',Y,0.0001);
	[bboxes, confidences, image_ids] = run_detector(test_scn_path, w, b, feature_params);

Result

There isn't much difference in average precision after implementing this part. But the number of false positives reduces for many images.

Precision Recall curve.

Detection on an images.

Implementation Average Precision
With Suppression 0.891
Without Suppresion 0.896

HoG Descriptor

The code for this part is in hog_descriptor.m file. In the hog descriptor function:

  1. Calculate the gradient_magnitude and gradient_direction using the imgradient function.
  2. Take a patch of size cell_size x cell_size and take the absolute values of the gradient direction values in this patch.
  3. Classify them into an appropriate bin of the histogram and add the magnitude of this patch into the histogram. I have used the number of bins as 9.
  4. This is done for every patch in the image.
  5. After finding the histograms I normalize the histograms by splitting them into patches of 2 x 2 to return a m x n x 36 feature matrix. Note the third dimension is 36 because of 2 x 2 x 9.

Result

There isn't much difference in average precision after implementing this part. It is a bit slower than the implementation of vl_hog function.