Example of a right floating element.
In this project we attempt to create a face detector using a sliding window technique.
We first need to collect positive features and random negative features to train our SVM. We're given 36x36 pixel faces to use as our positive features. We're also given a collection of negative features, so we randomly pick a number of pictures and sample a random patch of pixels as our negative feature. We tried different values of hog_cell_size (6, 4, 3) and overall having a cell size of 3 was most precise with 93% average precision. This could be due to smaller cell sizes capturing more spacial information, which is important when it comes to locating face areas.
function features_pos = get_positive_features(train_path_pos, feature_params)
image_files = dir( fullfile( train_path_pos, '*.jpg') ); %Caltech Faces stored as .jpg
num_images = length(image_files);
dimensionality = (feature_params.template_size / feature_params.hog_cell_size)^2 * 31;
features_pos = zeros(num_images, dimensionality);
for i = 1:num_images
image_path = fullfile(train_path_pos, image_files(i).name);
image = single(imread(image_path));
hog = vl_hog(image, feature_params.hog_cell_size);
features_pos(i, :) = reshape(hog, [1 dimensionality]);
end
end
function features_neg = get_random_negative_features(non_face_scn_path, feature_params, num_samples)
image_files = dir( fullfile( non_face_scn_path, '*.jpg' ));
num_images = length(image_files);
dimensionality = (feature_params.template_size / feature_params.hog_cell_size)^2 * 31;
features_neg = zeros(num_samples, dimensionality);
s = 1;
while s <= num_samples
image_file = datasample(image_files, 1);
image_path = fullfile(non_face_scn_path, image_file.name);
image = single(rgb2gray(imread(image_path)));
[r, c] = size(image);
if r > 0 && c > 0
rand_row = randi(r - feature_params.template_size + 1);
rand_col = randi(c - feature_params.template_size + 1);
patch = image(rand_row:rand_row + feature_params.template_size - 1, ...
rand_col:rand_col + feature_params.template_size - 1);
hog = vl_hog(patch, feature_params.hog_cell_size);
features_neg(s, :) = reshape(hog, [1 dimensionality]);
s = s + 1;
end
end
end
We use these training features to train out linear SVM. We use a lambda of 0.0001 as it was recommended and it did produce the best results during this and previous projects.
num_pos = size(features_pos, 1);
num_neg = num_negative_examples;
lambda = 0.0001;
X = [features_pos; features_neg];
Y = [ones(num_pos, 1); zeros(num_neg, 1)] * 2 - 1;
%YOU CODE classifier training. Make sure the outputs are 'w' and 'b'.
[w b] = vl_svmtrain(X', Y', lambda);
Though our SVM has can test well against the training data, we decided to run our SVM through a test data of non-face pictures to find false positives. We use these false positives to retrain our SVM. We basically run our current SVM through the sliding window procedure on the non-facial test images and if there is a false positive with a high enough confidence, we save the HOG feature. After we collect our false positives, we append this data to the old features we found earlier and retrain the SVM. This code is very similar to the run_detector() code. Including hard negatives into my training actually caused the program to score less. This overtrained the SVM to throw out too many positives. Results are in the table below.
Running this process to retrain the SVM
function [features_false_pos] = false_positive_detector(non_face_scn_path, w, b, feature_params)
non_face_scenes = dir( fullfile( non_face_scn_path, '*.jpg' ));
%initialize these as empty and incrementally expand them.
bboxes = zeros(0,4);
confidences = zeros(0,1);
image_ids = cell(0,1);
SVM_CONFIDENCE_THRESH = 0.5;
template_size = feature_params.template_size;
hog_cell_size = feature_params.hog_cell_size;
patch_width = template_size / hog_cell_size;
dimensionality = (feature_params.template_size / feature_params.hog_cell_size)^2 * 31;
features_false_pos = zeros(0, dimensionality);
for i = 1:length(non_face_scenes)
fprintf('Detecting faces in %s\n', non_face_scenes(i).name)
img = imread( fullfile( non_face_scn_path, non_face_scenes(i).name ));
img = single(img)/255;
if(size(img,3) > 1)
img = rgb2gray(img);
end
cur_bboxes = zeros(0, 4);
cur_confidences = zeros(0);
cur_image_ids = {};
cur_false_pos = zeros(0, dimensionality);
scales = [1, 0.9, 0.8];
for s = scales
% Scale image
s_img = imresize(img, s);
% Hog the image
hog = vl_hog(s_img, hog_cell_size);
[rows, cols] = size(s_img);
[h_rows, h_cols, ~] = size(hog);
% Go through image in hog space
for r = 1:h_rows - patch_width + 1
for c = 1:h_cols - patch_width + 1
% Get patch and test on SVM
patch = hog(r:r + patch_width - 1, c:c + patch_width - 1, :);
patch = reshape(patch, [1 patch_width^2 * 31]);
confidence = patch * w + b;
% if good confidence, add original image coordinates
if confidence > SVM_CONFIDENCE_THRESH
tlc = ((c - 1) * hog_cell_size + 1) / s;
tlr = ((r - 1) * hog_cell_size + 1) / s;
brc = ((c + patch_width - 1) * hog_cell_size) / s;
brr = ((r + patch_width - 1) * hog_cell_size) / s;
cur_bboxes = [cur_bboxes; [tlc, tlr, brc, brr]];
cur_confidences = [cur_confidences; confidence];
cur_false_pos = [cur_false_pos; patch];
end
end
end
end
num_patches = size(cur_bboxes, 1);
cur_image_ids(1:num_patches,1) = {non_face_scenes(i).name};
%non_max_supr_bbox can actually get somewhat slow with thousands of
%initial detections. You could pre-filter the detections by confidence,
%e.g. a detection with confidence -1.1 will probably never be
%meaningful. You probably _don't_ want to threshold at 0.0, though. You
%can get higher recall with a lower threshold. You don't need to modify
%anything in non_max_supr_bbox, but you can.
[is_maximum] = non_max_supr_bbox(cur_bboxes, cur_confidences, size(img));
cur_confidences = cur_confidences(is_maximum,:);
cur_bboxes = cur_bboxes( is_maximum,:);
cur_image_ids = cur_image_ids( is_maximum,:);
cur_false_pos = cur_false_pos( is_maximum,:);
bboxes = [bboxes; cur_bboxes];
confidences = [confidences; cur_confidences];
image_ids = [image_ids; cur_image_ids];
features_false_pos = [features_false_pos; cur_false_pos];
end
With the SVM parameters, we then run the sliding window process through our test data. We first turn the test image into HOG space, look at a window of hog features, compute the SVM with the HOG parameters, and evaluate the confidence. We kept any value greater than a threshold (-0.2). We also applied this technique to 10 different scales of the image to compensate for different close-up facial pictures. This choice to scale the image this many times may have made the cell sizes 3 and 4 have no difference. With these two cell sizes, we can see that the precision and bounding boxes are nearly identical. The HOG features must be picking up enough information to make a decision. The program also had a hard time detecting very detailed images such as audrey's image and tended to detect photos with a little blur. Because our hog template was only 36 pixels, the high frequencies cannot be accurately depicted and are lost.
function [bboxes, confidences, image_ids] = run_detector(test_scn_path, w, b, feature_params)
test_scenes = dir( fullfile( test_scn_path, '*.jpg' ));
%initialize these as empty and incrementally expand them.
bboxes = zeros(0,4);
confidences = zeros(0,1);
image_ids = cell(0,1);
SVM_CONFIDENCE_THRESH = -0.2;
template_size = feature_params.template_size;
hog_cell_size = feature_params.hog_cell_size;
patch_width = template_size / hog_cell_size;
for i = 1:length(test_scenes)
fprintf('Detecting faces in %s\n', test_scenes(i).name)
img = imread( fullfile( test_scn_path, test_scenes(i).name ));
img = single(img)/255;
if(size(img,3) > 1)
img = rgb2gray(img);
end
cur_bboxes = zeros(0, 4);
cur_confidences = zeros(0);
cur_image_ids = {};
scales = [1, 0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1];
for s = scales
% Scale image
s_img = imresize(img, s);
% Hog the image
hog = vl_hog(s_img, hog_cell_size);
[rows, cols] = size(s_img);
[h_rows, h_cols, ~] = size(hog);
% Go through image in hog space
for r = 1:h_rows - patch_width + 1
for c = 1:h_cols - patch_width + 1
% Get patch and test on SVM
patch = hog(r:r + patch_width - 1, c:c + patch_width - 1, :);
patch = reshape(patch, [1 patch_width^2 * 31]);
confidence = patch * w + b;
% if good confidence, add original image coordinates
if confidence > SVM_CONFIDENCE_THRESH
tlc = ((c - 1) * hog_cell_size + 1) / s;
tlr = ((r - 1) * hog_cell_size + 1) / s;
brc = ((c + patch_width - 1) * hog_cell_size) / s;
brr = ((r + patch_width - 1) * hog_cell_size) / s;
cur_bboxes = [cur_bboxes; [tlc, tlr, brc, brr]];
cur_confidences = [cur_confidences; confidence];
end
end
end
end
num_patches = size(cur_bboxes, 1);
cur_image_ids(1:num_patches,1) = {test_scenes(i).name};
[is_maximum] = non_max_supr_bbox(cur_bboxes, cur_confidences, size(img));
cur_confidences = cur_confidences(is_maximum,:);
cur_bboxes = cur_bboxes( is_maximum,:);
cur_image_ids = cur_image_ids( is_maximum,:);
bboxes = [bboxes; cur_bboxes];
confidences = [confidences; cur_confidences];
image_ids = [image_ids; cur_image_ids];
end