Project 5 : Face Detection with a Sliding Window

Introduction

This project explores the problem of face detection in static scenes, through the use of a sliding window method. The method implemented here utilizes histogram of gradients (HoGs) as feature encoders to train a linear SVM, and subwindows are sampled at multiple scales.As an extension, hard negative mining is also implemented which moderately boosted performance.

Training

Example of positive cases are pre-processed (cropped) and provided, while negative cases are randomly generated from images without faces. The samples generated are then converted to HoG features. Negative samples were taken at multiple scales. Increasing the number of negative samples leads to decreased detection of positives, while also decreasing detection of false positives.

Hard Negative Mining

Hard negative mining is the process of gathering false positive samples and using them as negative samples for training. This improves performance by providing the classifier enforcing negative stimulus on sample that are difficult to classify. The retrained classifier performs moderately better, although at the cost of additional runtime. Code shown below:


thresh_hard_neg = 0;
[~,~,~,features_set_hard] = run_detector(non_face_scn_path,w,b,feature_params,thresh_hard_neg);
hard_neg_feats = [features_pos;features_neg;features_set_hard];
hard_neg_labels = [ones(size(features_pos,1),1);-1.*ones(size(features_neg,1),1);-1*ones(size(features_set_hard,1),1)];
[w,b] = vl_svmtrain(hard_neg_features',hard_neg_labels',lambda);

Running Detector

Input image is converted to HoG feature space, and the sliding window detector is moved accross the converted image. Confidence measures are evaluated for each window, and the bounding box corresponding to the running highest value is stored. Image is downsampled until image is smaller than the template size.

Results

The sliding window detector was evaluated at varying threshold values, and with/without datamining of hard negative samples. The average precision using a threshold of 0.5 and no hard negative mining is 83.9%, while a threshold of 0.3 resulted in an average precision of 84.1%. Using hard negative mining resulted in an average precision of

Threshold = 0.30