This function should return all positive training examples (faces). Each face should be converted into a HoG template according to 'feature_params'
image_files = dir( fullfile( train_path_pos, '*.jpg') ); %Caltech Faces stored as .jpg
N = length(image_files);
D = (feature_params.template_size / feature_params.hog_cell_size)^2 * 31;
features_pos = zeros(N, D);
for i = 1 : N
fprintf('get positive feature of image %d\n', i);
image = im2single(imread(fullfile(train_path_pos, image_files(i).name)));
hog_feature = vl_hog(image, feature_params.hog_cell_size);
hog_feature = reshape(hog_feature, [1, D]);
features_pos(i, :) = hog_feature;
end
This function should return negative training examples (non-faces). Negative training examples is selected randomly. I try to extract same number features from each face-free image
image_files = dir( fullfile( non_face_scn_path, '*.jpg' ));
N = length(image_files);
D = (feature_params.template_size / feature_params.hog_cell_size)^2 * 31;
features_neg = zeros(N, D);
nSamplePerImage = ceil(num_samples / N);
for i = 1 : N
fprintf('get negative non-face feature of image %d\n', i);
image = im2single(rgb2gray(imread(fullfile(non_face_scn_path, image_files(i).name))));
maxPatchXIndex = size(image, 2) - feature_params.template_size;
maxPatchYIndex = size(image, 1) - feature_params.template_size;
sampleNum = min([nSamplePerImage, maxPatchXIndex, maxPatchYIndex]);
samplePatchXindex = randsample(maxPatchXIndex, sampleNum);
samplePatchYindex = randsample(maxPatchYIndex, sampleNum);
for j = 1 : sampleNum
patch = image(samplePatchYindex(j) : samplePatchYindex(j) + feature_params.template_size - 1, samplePatchXindex(j) : samplePatchXindex(j) + feature_params.template_size - 1);
hog_feature = vl_hog(patch, feature_params.hog_cell_size);
hog_feature = reshape(hog_feature, [1, D]);
features_neg((i - 1) * nSamplePerImage + j, :) = hog_feature;
end
end
I choose lambda = 0.0001
X = cat(1, features_pos, features_neg);
Y = cat(1, ones(size(features_pos, 1), 1), -1 * ones(size(features_neg, 1), 1));
lambda = 0.0001;
[w, b] = vl_svmtrain(X', Y', lambda);
I choose confidence threshold as 0.2. HOG features are extracted at Scales = [1.0, 0.8, 0.6, 0.5, 0.25, 0.125]; When the image is scaled, we also need to scale the coordinates of detection bounding box
test_scenes = dir(fullfile(test_scn_path, '*.jpg'));
%initialize these as empty and incrementally expand them.
bboxes = zeros(0, 4);
confidences = zeros(0, 1);
image_ids = cell(0, 1);
threshold = 0.2;
nCellPerTemplate = feature_params.template_size / feature_params.hog_cell_size;
D = nCellPerTemplate ^ 2 * 31;
%Scales = [1.0];
Scales = [1.0, 0.8, 0.6, 0.5, 0.25, 0.125];
for i = 1 : length(test_scenes)
fprintf('Detecting faces in %s\n', test_scenes(i).name)
image = imread(fullfile( test_scn_path, test_scenes(i).name));
image = single(image) / 255;
if(size(image,3) > 1)
image = rgb2gray(image);
end
cur_x_min = []; cur_y_min = [];
cur_x_max = []; cur_y_max = [];
cur_confidences = [];
for s = 1 : length(Scales)
scaled_image = imresize(image, Scales(s));
hog_features = vl_hog(scaled_image, feature_params.hog_cell_size);
for j = 1 : size(hog_features, 1) - nCellPerTemplate
for k = 1 : size(hog_features, 2) - nCellPerTemplate
template_hog_feature = hog_features(j : j + nCellPerTemplate - 1, k : k + nCellPerTemplate - 1, :);
template_hog_feature = reshape(template_hog_feature, [1, D]);
score = template_hog_feature * w + b;
if score > threshold
y_min = (j - 1) * feature_params.hog_cell_size;
x_min = (k - 1) * feature_params.hog_cell_size;
y_max = y_min + feature_params.template_size - 1;
x_max = x_min + feature_params.template_size - 1;
y_min = floor(y_min / Scales(s)) + 1; x_min = floor(x_min / Scales(s)) + 1;
y_max = floor(y_max / Scales(s)) + 1; x_max = floor(x_max / Scales(s)) + 1;
if x_max > size(image, 2) || y_max > size(image, 1)
fprintf('j: %d k: %d\n', j, k);
fprintf('image size: %d %d\n', size(image, 1), size(image, 2));
fprintf('x_min: %d y_min: %d y_max: %d x_max: %d\n', x_min, y_min, y_max, x_max);
error('out of bound!');
else
cur_x_min = [cur_x_min; x_min]; cur_y_min = [cur_y_min; y_min];
cur_x_max = [cur_x_max; x_max]; cur_y_max = [cur_y_max; y_max];
cur_confidences = [cur_confidences; score];
end
end
end
end
end
cur_bboxes = [cur_x_min, cur_y_min, cur_x_max, cur_y_max];
cur_image_ids(1 : size(cur_bboxes, 1), 1) = {test_scenes(i).name};
if size(cur_bboxes, 1) ~= 0
[is_maximum] = non_max_supr_bbox(cur_bboxes, cur_confidences, size(image));
cur_confidences = cur_confidences(is_maximum,:);
cur_bboxes = cur_bboxes( is_maximum,:);
cur_image_ids = cur_image_ids( is_maximum,:);
bboxes = [bboxes; cur_bboxes];
confidences = [confidences; cur_confidences];
image_ids = [image_ids; cur_image_ids];
else
fprintf('0 detection\n');
end
end
I extract hard example in the same scales as in run_detector.m. To avoid extracting too many hard examples, I set a nMax to limit the number.
image_files = dir( fullfile( non_face_scn_path, '*.jpg' ));
nCellPerTemplate = feature_params.template_size / feature_params.hog_cell_size;
D = nCellPerTemplate ^ 2 * 31;
nMax = 5000;
features_neg_hard = zeros(nMax, D);
%Scales = [1.0];
Scales = [1.0, 0.8, 0.6, 0.5, 0.25, 0.125];
threshold = 0.2;
index = 1;
for i = 1 : length(image_files)
fprintf('get hard negative feature of image %d\n', i);
image = im2single(rgb2gray(imread(fullfile(non_face_scn_path, image_files(i).name))));
if(index > nMax)
break;
end
for s = 1 : length(Scales)
scaled_image = imresize(image, Scales(s));
hog_features = vl_hog(scaled_image, feature_params.hog_cell_size);
for j = 1 : size(hog_features, 1) - nCellPerTemplate
for k = 1 : size(hog_features, 2) - nCellPerTemplate
template_hog_feature = hog_features(j : j + nCellPerTemplate - 1, k : k + nCellPerTemplate - 1, :);
template_hog_feature = reshape(template_hog_feature, [1, D]);
score = template_hog_feature * w + b;
if score > threshold && index <= nMax
features_neg_hard(index, :) = template_hog_feature;
index = index + 1;
end
end
end
end
end
features_neg_hard = features_neg_hard(1 : index - 1, :);
In my experiemnt, hard negative mining does not improve the accuracy. In fact, it even reduces the accuracy, although I restrict the random negative number to be 5000.
< |
< |
Perhaps when learning the hard negative helps us exclude false positives, at the same time it also rejects some true positives. Since the accuracy computation provided in the template code doesn't penalize false positives, the accuracy tends to be lower.
With hard negatives |
Without hard negatives |
I flip and rotate the original face image and extract HOG features to augment the positive training data.
rotate_image = imrotate(image, rotate_angle(j), 'bilinear', 'crop');
hog_feature1 = vl_hog(rotate_image, feature_params.hog_cell_size);
flip_image = fliplr(image);
hog_feature2 = vl_hog(flip_image, feature_params.hog_cell_size);
Augment may not show to improve accuracy if the original method already has a high accuracy with a low threshold. So to see if the augment can increase the accuracy, I set a high threshold = 1.5 to make accuracy lower.
Augment |
Original |
[1] Navneet Dalal, Bill Triggs. Histograms of Oriented Gradients for Human Detection, Vision and Pattern Recognition (CVPR'05), 2005