The output on the friends image after the algorithm implementation. Even the algorithm recognizes them perfectly :)
The algorithm contains the following implementations:
The following description explains the implementation of each of the algorithms. For running the hog_descriptor and the hard_negative_mining (extra credit), I have mentioned the lines to be commented and uncommented(in different files) in the README.txt.
In this implementation I compute the hog features of every image in the training set. I have used the template_size as 36 and cell_size as 3 for computing the hog features.
In my default implementation I have used te vl_hog function to make computations, but I have mentioned the instructions in README.txt to change this to the hog function that I have implemented.
The code in this file :
image_files = dir( fullfile( train_path_pos, '*.jpg') );
num_images = length(image_files);
features_pos = zeros(num_images,(feature_params.template_size / feature_params.hog_cell_size)^2 * 31);
for i = 1:num_images
disp(i)
image = single(imread(strcat(train_path_pos,'\',image_files(i).name)));
hog_image = vl_hog(image,feature_params.hog_cell_size);
features_pos(i,:) = reshape(hog_image,[],1)';
end
Note : Since it is fewer number of lines, I thought it would be convenient for you to have the code here itself.
In this implementation as seen below, I choose random indices to get a patch from the image of template_size x template_size. Once I have this patch I find the hog features of this patch. For every image I find the number of (hog features = total samples to be found / number of images). This doesn't exactly give the total hog features equal to the required samples, because of the ceil function used.
In my default implementation I have used te vl_hog function to make computations, but I have mentioned the instructions in README.txt to change this to the hog function that I have implemented.
The code in this file :
image_files = dir( fullfile( non_face_scn_path, '*.jpg' ));
num_images = length(image_files);
num_samples_per_image = ceil(num_samples/num_images);
features_neg = zeros(num_images * num_samples_per_image,(feature_params.template_size / feature_params.hog_cell_size)^2 * 31);
for i = 1:num_images
disp(i)
image = imread(strcat(non_face_scn_path,'\',image_files(i).name));
image = single(rgb2gray(image));
[m,n] = size(image);
for j = 1:num_samples_per_image
index1 = randi(m - feature_params.template_size + 1);
index2 = randi(n - feature_params.template_size + 1);
image_patch = image(index1 : index1 + feature_params.template_size - 1, index2 : index2 + feature_params.template_size - 1);
hog_image = vl_hog(image_patch,feature_params.hog_cell_size);
features_neg((i-1)*num_samples_per_image + j,:) = reshape(hog_image,[],1)';
end
end
This section of the code is mentioned in the proj5.m file itself. Here after computing the positive and the negative features as mentioned above, I use these features and vertically concatenate them and also assign the values of +1 for faces and -1 for non faces as labels.
Then I call the function vl_svmtrain and find the values of w and b. I have used the value of lambda as 0.0001 since this value gave me the optimal results.
The code for this section :
X = [features_pos; features_neg];
Y = [ones(size(features_pos,1),1); -ones(size(features_neg,1),1)];
[w,b] = vl_svmtrain(X',Y,0.0001);
Face template HoG visualization.
As you can see from the above image, it looks like a face at the center.
In this implementation, for every image I rescale and find boxes 'steps' number of times. In my implementation, I have used the steps as 40. For every step :
I do this steps number of times.
Paramters used:
In my default implementation I have used the vl_hog function to make computations, but I have mentioned the instructions in README.txt to change this to the hog function that I have implemented.
HoG cell Size | Average Precision |
3 | 0.921 |
4 | 0.905 |
6 | 0.829 |
For HoG cell size 3:
confusion Threshold | Average Precision |
-0.1 | 0.920 |
0.0 | 0.920 |
0.1 | 0.921 |
0.25 | 0.915 |
0.45 | 0.906 |
0.50 | 0.909 |
Rescale Factor | Average Precision |
0.9 | 0.919 |
0.85 | 0.921 |
0.75 | 0.918 |
Lambda | Average Precision |
0.0001 | 0.921 |
0.0005 | 0.917 |
0.001 | 0.918 |
0.00001 | 0.911 |
I get an average precision of 0.921 after using the above paramters on the test images. The precision recall curve and some of the images that I get after this implementation are:
Detection on an image with several faces.
Detection on an image with moderately many faces.
Detection on an image with few faces.
Detection on an image with a single face.
The results of implementing the above algorithm on class images:
For implementing this, once I get the positive features and negative features as mentioned above, I run the run_detector code on the non_face dataset and find the boxes given by this algorithm. I have a separate file for this detector called run_detector_hard, which does not implement the maximum suppression and returns boxes without it.
In the get_hard_negative_features I take the box obtained above and then take a patch of template_size x template_size from the center of the box to accomodate the scaling.
Then using this patch use the vl_hog to find HoG features of this patch and add it to the negative features.
Once I have these negative features, I add them to the previous dataset as negative labels and then again call the run_detetctor on the test dataset.
I implemented this with and without suppression. I got slightly better precision and fewer false positives without suprresion.
X = [features_pos; features_neg];
Y = [ones(size(features_pos,1),1); -ones(size(features_neg,1),1)];
[w,b] = vl_svmtrain(X',Y,0.0001);
[bboxes_nf, confidences_nf, image_ids_nf] = run_detector_hard(non_face_scn_path, w, b, feature_params);
features_neg = get_hard_negative_features(non_face_scn_path, bboxes_nf, image_ids_nf, feature_params);
X = [X; features_neg];
Y = [Y; -ones(size(features_neg,1),1)];
[w,b] = vl_svmtrain(X',Y,0.0001);
[bboxes, confidences, image_ids] = run_detector(test_scn_path, w, b, feature_params);
There isn't much difference in average precision after implementing this part. But the number of false positives reduces for many images.
Detection on an images.
Implementation | Average Precision |
With Suppression | 0.891 |
Without Suppresion | 0.896 |
The code for this part is in hog_descriptor.m file. In the hog descriptor function:
There isn't much difference in average precision after implementing this part. It is a bit slower than the implementation of vl_hog function.