They key steps in this project were:
Extracting features from the positive images was an easy task which involved only running vl_hog over the images since they were already template sized images. I obtained 6713 positive features. Extracting features from the negative images required a more comprehensive method. The images were not necessarily template sized and were rgb unlike the positive images. So, I first converted these images to grayscale and then randomly chose template size patches from these images to run hog over. The reason why randomly choosing some 20-40 patches from each image was enough was because we had very few positive training examples(size wise) than the negative examples and choosing a lot more negative example would lead to bias. Hence, I chose some 10000 negative features randomly.
The nest step was building a classifier. I chose svm with a regularization value of 0.0001.
I did the following steps in order to build a detector:
I ran it intially with only one scale and obtained the following results:
Following this, I ran my detector my multiple scales: [1,0.8,0.8^2,0.8^3,0.8^4,0.8^5,0.8^7,0.8^9]. This led to a very large increase in the average precision values. Following are the results obtined with different thresholds in the range of 0.5-0.9.
Here are some of the test images. I noticed that even though lowering threshold seemed to be increasing average precision, there were a lot of red boxes in the images so obtained.
Following this, I tried to vary hog cell sizes to 3 and 4 and obtained the following improved results.
I implemented hard negative mining as follows:
This is more usueful in a setting where there are fewer negative examples to train the model on. But in our case, since there were already plenty of training examples available, not much difference was noted in the average precision values as a result of implementing hard negative mining.
I noticed that the detections were much cleaner, i.e. fewer false positives in the image, but in order to achieve the same precision as before, I had to vary threshold and this led to more red boxes in the image as compared to the initial threshold
Here is an example of a test image where the number of false detections went down drastically with hard negative mining.
With detector of template size 36 and hog cell size=6