In this project, the sliding window model, which independently classifies all image patches as being object or non object, is implemented. For each patch (window), the presence of face is determined. This is done in three steps -
The positive and negative features are obtained by using the two training sets. The vl_hog function is used to generate a HoG template for each of the image. For positive features the HoG template is used directly, whereas for negative features, samples are selected randomly from the HoG template.
A linear classifier is built using the positive and negative features with a call to vl_trainsvm. Lambda, the regularization parameter is an important parameter that affects the average precision as it controls the amount of bias in the model. The degree of underfitting or overfitting to the training data is determined by the value of lambda. Lambda value of 0.0001 is chosen for better results.
Each test image is converted to HoG feature space using the vl_hog function. HoG cells in the Hog feature space are stepped over, taking groups of cells that are of the same size as the learned template. If the classification is above the choosen confidence, the detection is retained. The minimum confidence was varied and the average precision was observed for different values of the threshold. The general trend was with an increase in threshold, the accuracy increased but the number of false positives also increased. Hence, an optimum value of -0.8 is chosen. Non-maximum suppression, which counts a duplicate detection as wrong is run on the detections in order to improve the performance.
Cell Size | Average Precision Graph | Face Detection |
6 | ||
4 | ||
3 |
Precision Recall curve for the above configuration.
Example of detection on the test set from the starter code.
Hard negative mining takes into account the features of all non faces that have been detected as faces, falsely by the detector. These features are then appended to the random negative features obtained previously and the SVM is retrained. Below is the comparison showing the effect of hard mining. As can be seen from the comparison, the average precision improves with hard mining as well as the number of false positives are greatly reduced.
The below results are obtained for a cell size of 6.
Average Precision Accuracy
Without hard negative mining |
With hard negative mining |
Face Detection
Without hard negative mining |
With hard negative mining |
The sliding window protocol for face detection was implemented. It works efficiently in detecting faces. It was observed that the parameters such as lambda, confidence, number of samples, training data etc influence the accuracy to a large extent and must be tuned properly. Also, an improvemnet in the accuracy was observed on increasing the number of positive and negative features including hard negative mining.