In this project, the task was to implement an object detection system for faces. Given an image, the program is supposed to draw bounding boxes around every face. The process contains several parts:
This process is very straightforward. Each face image with 36x36 size is converted to HOG feature using vl_hog function. The size of each feature is (template_size/hog_cell_size)^2*31
while i < num_images && total_samples < num_samples
disp(i)
img = single(rgb2gray(imread(fullfile(non_face_scn_path, image_files(i).name))))/255;
num_hog_rand = hog_per_img*(1+(rand-0.5)*0.5);
example_mined = 0;
NumPerScales = log(feature_params.template_size/min(size(img)))/log(0.9);
while example_mined < num_hog_rand && min(size(img))>feature_params.template_size
hog_tmp = vl_hog(img,feature_params.hog_cell_size);
num_tmp_scale = 0;
while num_tmp_scale < NumPerScales
i_row = randsample(size(hog_tmp,1)-feature_params.template_size/feature_params.hog_cell_size+1,1);
i_col = randsample(size(hog_tmp,2)-feature_params.template_size/feature_params.hog_cell_size+1,1);
feature_tmp = hog_tmp(i_row:(i_row+feature_params.template_size/feature_params.hog_cell_size-1),...
i_col:(i_col+feature_params.template_size/feature_params.hog_cell_size-1),:);
feature_tmp = reshape(feature_tmp, [1,dim]);
features_neg(total_samples+1,:) = feature_tmp;
num_tmp_scale = num_tmp_scale +1;
total_samples = total_samples +1;
example_mined = example_mined +1;
end
img = imresize(img,0.9);
end
i =i+1;
end
features_neg = features_neg(1:num_samples,:);
for i = 1:length(test_scenes)
img = imread( fullfile( non_face_scn_path, test_scenes(i).name ));
img = single(img)/255;
if(size(img,3) > 1)
img = rgb2gray(img);
end
resize_ratio = 1.2;
ratio = 0.9;
while (min(size(img)) > feature_params.template_size)
hog_tmp = vl_hog(img, feature_params.hog_cell_size);
for i_row = hog_per_template : size(hog_tmp,1)
for i_col = hog_per_template : size(hog_tmp,2)
feature_tmp = hog_tmp((i_row - hog_per_template + 1):i_row,...
(i_col - hog_per_template + 1):i_col, :);
feature_tmp = reshape(feature_tmp, [1 dim]);
conf = feature_tmp*w+b;
if (conf > thres)
features_hard_neg = [features_hard_neg;feature_tmp];
end
end
end
img = imresize(img, ratio);
resize_ratio = resize_ratio * ratio;
end
end
It is used a sliding window to determine whether or not there was a potential detection. A linear classifier, which is generated before, on the HoG cell in the sliding window and calculating the confidence. If the confidence is greater than the threshold, then such window is considered as positive detection and added to the list. After the sliding window finished, non-maximum suppression was used to remove any nearby duplicates. Since the images' size and the face patterns' size in each image are different, it is neccesary to detect the image at multiple scales. I resize the image by 0.95 each time until the image size is small than the template size.
while min(size(img_resize))>feature_params.template_size
hog_tmp = vl_hog(img_resize,feature_params.hog_cell_size);
for i_row = hog_per_template : size(hog_tmp,1)
for i_col = hog_per_template : size(hog_tmp,2)
feature_tmp = hog_tmp((i_row - hog_per_template+1):i_row, ...
(i_col - hog_per_template+1):i_col, :);
feature_tmp = reshape(feature_tmp, [1 dim]);
conf = feature_tmp*w+b;
if conf > thres
bbox = [(i_col - hog_per_template+1),...
(i_row - hog_per_template+1),i_col,i_row]/resize_ratio*...
feature_params.hog_cell_size;
cur_bboxes = [cur_bboxes; bbox];
cur_confidences = [cur_confidences; conf];
cur_image_ids = [cur_image_ids; {test_scenes(i).name}];
end
end
end
img_resize = imresize(img_resize, ratio);
resize_ratio = resize_ratio*ratio;
end
neg = 20000 accuracy = 88.3 hard_accuracy = 89.5
neg = 10000 accuracy = 86.5 hard_accuracy = 86.0
neg = 5000 accuracy = 85.9 hard_accuracy = 87.6
I also change the number of cell size, and find out that the accuracy is not increased with the cost of more running time. neg = 20000 hog_cell = 3, accuracy = 85.3
neg = 20000 hog_cell = 6, accuracy = 89.5
Face template HoG visualization for the starter code. This is completely random, but it should actually look like a face once you train a reasonable classifier.
Precision Recall curve for the starter code.
Example of detection on the test set from the starter code.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.