Project 5 / Face Detection with a Sliding Window

In this project, the sliding window model, which independently classifies all image patches as being object or non object, is implemented. For each patch (window), the presence of face is determined. This is done in three steps -

  1. Getting positive and negative features from the trained examples
  2. Training the classifier with the above mentioned features
  3. Face detection on test set using the classifier

Obtaining features from the training set

The positive and negative features are obtained by using the two training sets. The vl_hog function is used to generate a HoG template for each of the image. For positive features the HoG template is used directly, whereas for negative features, samples are selected randomly from the HoG template.

Training the classifier

A linear classifier is built using the positive and negative features with a call to vl_trainsvm. Lambda, the regularization parameter is an important parameter that affects the average precision as it controls the amount of bias in the model. The degree of underfitting or overfitting to the training data is determined by the value of lambda. Lambda value of 0.0001 is chosen for better results.

Face Detection on the test set

Each test image is converted to HoG feature space using the vl_hog function. HoG cells in the Hog feature space are stepped over, taking groups of cells that are of the same size as the learned template. If the classification is above the choosen confidence, the detection is retained. The minimum confidence was varied and the average precision was observed for different values of the threshold. The general trend was with an increase in threshold, the accuracy increased but the number of false positives also increased. Hence, an optimum value of -0.8 is chosen. Non-maximum suppression, which counts a duplicate detection as wrong is run on the detections in order to improve the performance.

Results

Effect of cell size

Cell Size Average Precision Graph Face Detection
6
4
3


Face template HoG visualization for cell size 3 and confidence -0.8 is as below

Precision Recall curve for the above configuration.

Example of detection on the test set from the starter code.

Extra Credit

Hard Negative Mining

Hard negative mining takes into account the features of all non faces that have been detected as faces, falsely by the detector. These features are then appended to the random negative features obtained previously and the SVM is retrained. Below is the comparison showing the effect of hard mining. As can be seen from the comparison, the average precision improves with hard mining as well as the number of false positives are greatly reduced.
The below results are obtained for a cell size of 6.

Average Precision Accuracy

Without hard negative mining

With hard negative mining

Face Detection

Without hard negative mining

With hard negative mining

Conclusion

The sliding window protocol for face detection was implemented. It works efficiently in detecting faces. It was observed that the parameters such as lambda, confidence, number of samples, training data etc influence the accuracy to a large extent and must be tuned properly. Also, an improvemnet in the accuracy was observed on increasing the number of positive and negative features including hard negative mining.