Example of a face detection.
The aim in this project is to detect faces in various scenes using sliding window. We build a histogram of gradients (HoG) for the entire image, and then classify these using trained classifiers. The HOG (Histogram of Gradient) is used as representation of feature template,nd these features are used to train a linear Support Vector Machine to locate faces in test images.
The major steps involved can be seen as follow:
The faces are 36*36 pixels, the hog cell size is set as 6 (although various values are tried - say 4,3). Number of orientations were changed and set to a value of 9, however, it did not change the precision value by a great deal. The Lambda value is set throughout to be 0.0001
HOG features are extracted from the positive image set and negative image set. The size is set at 36 and the cell size is set to 6 for faster compute. However value of 4 or 3 gives higher precision but takes more compute time. The HoG is calculated by using the vl_hog method. This vector is then reshaped, flattened and returned. Training for negative image set is also similar to one for positive, however with one change. Here, hard negative mining is also done (check graduate credit section).
image_files = dir( fullfile( train_path_pos, '*.jpg') ); %Caltech Faces stored as .jpg
num_images = length(image_files);
size = (feature_params.template_size / feature_params.hog_cell_size)^2 * 31;
features_pos = zeros(0, size);
for i = 1:num_images
im = imread(fullfile(train_path_pos, image_files(i).name));
im = im2single(im);
hog_f = vl_hog(im, feature_params.hog_cell_size);
hog_f = reshape(hog_f,1, size);
features_pos(i, :) = hog_f;
end
end
Setting lambda value to be 0.0001, a vl_svmtrain is called with the input and output label svm data. +1 represented the positive features and -1 represented the negative features. The average precision value changed by a very minor factor where the amount of negative training samples were increased.
%example code
svm_data = [features_pos; features_neg];
svm_labels = [ones(size(features_pos,1),1); -ones(size(features_neg,1),1)];
lambda = 0.0001;
[w, b] = vl_svmtrain(svm_data', svm_labels, lambda);
The major part of this project is setting the run detector to test images. A sliding window method is used here. A window is moved over the HoG cell that is obtained and a confidence score is obtained. It is considered if the confidence score obtained is greater than 0.75 (threshold). For positive results, the bounding box is obtained and added to positive detections. After the sliding window finishes running throughout the image, non-maximum suppression is performed, to avoid duplicate detection at multiple levels.
HoG Face Templace
Precision Recall curve for cell size 3
For graduate credit, I did hard mining for negative set. The step size considered is 6
Hard negative mining is done to improve the classifier using more negative examples. To do this, the face detector is run on a set of non-face negative training data. If we find a "face" then we add it negative data set. I performed with and without non-max suppression for this case. The SVM classifier model is again trained with this new data set obtained.
HOG
Precision Recall Curve
The amount of "red" reduces here
Performing hard negative mining reduces the number of false positives that are obtained and hence reduces the number of RED boxes in the images