Face Detection Project

In this project I implemented a face detection algorithm using sliding window model. Instead of the SIFT features that we have used in previous projects, I used HoG representation as descriptors. The project is consisted of 3 parts: handling training data, training classifier, and using trained classifier to detect face in sliding windows in different scales of test images.

1. Handing training data

The training data is consisted of two parts: images with face and images without face. Even though all the positive images (with face) are configured properly as they are of same size and the face size are approximately the same, the negative images are of different sizes and scales. Therefore, to achieve better performance, I used a scale factor of 0.9 when getting HoG features from negative images, meaning that I scale the image down to 0.9*original scale recursively and get some new features until the image is too small to achieve new features. I used 10000 negative images to get the features. Even though more image will improve the performance, it will take much longer to train, thus I kept using 10000 negative images.

2. Training classifier

This part is pretty strait forward, and there is one parameter lambda to tune with. After trying different lambda (0.1, 0.001 and 0.0001), the smallest worked the best, so I chose 0.0001.

3. Sliding window detection

In this part, I used multiple scale with 0.9 as the scale factor. I have also tried 0.7, but the average accuracy dropped by arround 6%, so the larger the scale factor is, the better the performance will be. However, the run time grows significantly when the scale factor becomes larger. When scale factor is 0.7 and step size is 6, the entire pipeline runs in less than a minute, whereas when scale factor is 0.9 and step size is 6, the run time is approximately 3 minutes.
Another factor that I have modified is the confidence threshold. When the treshold gets smaller, the accuracy gets higher, however the false positive rate gets higher too. To achieve better accuracy, I set the threshold to 0.7 when measuring the highest accuracy, whereas in the class image listed at the bottom of the page, since I want to detect the face with fewer false positive, I set the threshold to 0.3.
The step size also affacts run time significantly. When the step size is smaller, the pipline runs much slower. When I set step size to 4, the total runtime is 8 minutes, and when the setp size is 3, the runtime is 14.5 minutes.
The performance of the algorithm with different step size and fixed scale factor (0.9) is shown below. Normally when step size is smaller, the accuracy gets much better.

Results in a table

Face template HoG visualization for the starter code. This is completely random, but it should actually look like a face once you train a reasonable classifier.