This project explores the problem of face detection in static scenes, through the use of a sliding window method. The method implemented here utilizes histogram of gradients (HoGs) as feature encoders to train a linear SVM, and subwindows are sampled at multiple scales.As an extension, hard negative mining is also implemented which moderately boosted performance.
Example of positive cases are pre-processed (cropped) and provided, while negative cases are randomly generated from images without faces. The samples generated are then converted to HoG features. Negative samples were taken at multiple scales. Increasing the number of negative samples leads to decreased detection of positives, while also decreasing detection of false positives.
Hard negative mining is the process of gathering false positive samples and using them as negative samples for training. This improves performance by providing the classifier enforcing negative stimulus on sample that are difficult to classify. The retrained classifier performs moderately better, although at the cost of additional runtime. Code shown below:
thresh_hard_neg = 0;
[~,~,~,features_set_hard] = run_detector(non_face_scn_path,w,b,feature_params,thresh_hard_neg);
hard_neg_feats = [features_pos;features_neg;features_set_hard];
hard_neg_labels = [ones(size(features_pos,1),1);-1.*ones(size(features_neg,1),1);-1*ones(size(features_set_hard,1),1)];
[w,b] = vl_svmtrain(hard_neg_features',hard_neg_labels',lambda);
Input image is converted to HoG feature space, and the sliding window detector is moved accross the converted image. Confidence measures are evaluated for each window, and the bounding box corresponding to the running highest value is stored. Image is downsampled until image is smaller than the template size.
The sliding window detector was evaluated at varying threshold values, and with/without datamining of hard negative samples. The average precision using a threshold of 0.5 and no hard negative mining is 83.9%, while a threshold of 0.3 resulted in an average precision of 84.1%. Using hard negative mining resulted in an average precision of