Project 5 / Face Detection with a Sliding Window

The aim of this project is to detect faces in images. We use a sliding window approach where a window is slided through the image and the patch of the image for each of the window positions is passed through a classifier to detect whether there is a face in that window or not. This is done at various scales because different images may have different size of the face in them. The following things were implemented and tested.

  1. HoG feature formation (positive examples)
  2. Random Sampling (negative examples)
  3. Training a SVM classifier and tuning its parameters
  4. Running the classifier at multiple scales
  5. Adding extra positive training dataset (Extra Credit)
  6. Result and Analysis

The final best average precision I achieved was 0.833 on the test dataset

HoG features

HoG (Histogram of Gradients) feature is commonly used for object detection. The image is decomposed into small squared cells and a histogram of oriented gradients is is computed in each cell, the result is normalized and a final descriptor is computed. The HoG cell size used is 6. The size of the image is 36*36 and the vl_hog descriptor returned is 6*6*31 which is then reshaped to a vector of size 1*1116


for i=1:num_images
  img_path = strcat(train_path_pos, '/', image_files(i).name);
  img = im2single(imread(img_path));
  if(size(img,3) > 1)
    img = rgb2gray(img);
  end
  features_pos(i,:) = reshape(vl_hog(img, hog_cell_size), 1, D);
end
The initial accuracy on the training data is 1.0 which might be because the model is overfitting the training data.
 
accuracy:   1.000
true positive rate: 0.398
false positive rate: 0.000
true negative rate: 0.602
false negative rate: 0.000
HoG descriptors
missing

Random Sampling (negative examples)

We picked random negative examples to form the negative sample dataset. The random samples picked are again of size 36*36 We pick num_samples_per_image per image.
 
num_images = length(image_files);
template_size = feature_params.template_size;
hog_cell_size = feature_params.hog_cell_size;
D = (template_size / hog_cell_size)^2 * 31; %hog_window_size
features_neg = []; % num_images * D;

num_samples_per_image = ceil(num_samples/num_images);

for i=1:num_images
  img_path = strcat(non_face_scn_path, '/', image_files(i).name);
  img = im2single(imread(img_path));
  if(size(img,3) > 1)
    img = rgb2gray(img);
  end
  [height width] = size(img);
  x = randi([1 width - template_size], 1, num_samples_per_image);
  y = randi([1 height- template_size], 1, num_samples_per_image);
  
  for j=1:num_samples_per_image
    sample = img(y:y+template_size-1, x:x+template_size-1);
    features_neg = [features_neg; reshape(vl_hog(sample, hog_cell_size),1,D)];
  end

Training an SVM classifier and tuning its params

An SVM was trained with lambda 0.0001 . The performance degraded for higher values of lambda.

Running the classifier without scales

No scales
missing

Running the classifier with scales

No scales
missing

Effect of num_samples on average accuracy

Changing number of negative examples to 20000 from 10000
missing

Performance after adding extra dataset (Extra credit)

Extra dataset
missing
The extra data used was BioID Face Database [https://www.bioid.com/About/BioID-Face-Database] There were about 1522 faces which were converted to 36*36 images. This was done using a open cv in a python script. The python script is attached in the submission. The dataset which is 2.4mb is size has also been attached along the code submission.

Results

missing missing missing missing missing
As the number of faces increases, the number of false positives also increase and there are a lot of spurious red boxes.

ROC Curve

ROC curve
missing

Bonus Test Scenes

missing