Project 5 / Face Detection with a Sliding Window

Implementaion decision effects on performance:

1) Test features:

In order to extract negative features for the SVM classifier I randomly extracted 36 x 36 sized patches from the images in the non-face training set. The number of samples taken was 10,000 in total. It was suggested that extracting negative features at multiple scales would provide better performance. Using the single scale negative features resulted in a 84% average precision. With this result I chose to forgo multiple scale extraction.

2) Face Detector:

My detector runs a sliding window over the HOG of an image. This window extracts patches from the HOG with a step size that equivelent to the template_size/cell_size. Bounding boxes that correspond to the extracted patches are also taken. The bounding boxes are scaled by the size of the cell. The confidence of the patches is graded using a linear classifier and confidences below a certain threshold are not taken. This process is done for each image at a scale range 1.0 to 0.1 with a difference of .1 per scale. At a cell size of 6 this gives 84% accuracy.

Results:

Results by cell size

Above shows the average precision and the HOG template by cell size. For cell sizes of 6,4,3 the precisions were 84.8, 87.6 and 89.1. All three result were obtained with a confidence threshold of .78. The HOG template appears to to have a more defined shape as cell size decreases. The more defined HOG template corresponds to an increased precision for each respective cell size. Depending on the desired result the HOG template could actually produce and undesired result.

Detections on board with cell size of 6.

Detections on board with cell size of 4.

Although a higher overall precision is obtained in this photo of a white board more faces are detected on this photo with a larger cell size. This shows the influence the HOG template can have on certain detections.

Detections Audrey Hepburn with cell size of 6.

Detections Audrey Hepburn with cell size of 4.

In the photo of Audrey Hepburn the cell size of 6 fails to detect any face. This is another example of the HOG template influence.

Conclusion:

Confidence threshold can allow greater precision at the cost of more false positives.

Detections: