Project 4 / Scene Recognition with Bag of Words

Example of Face Detection after classification

To start out with this assignment I first implemented get positive features. To do this I first check the number of images passed through the parameter. After this is done I iterate through checking each path corresponding to each. I do this by cacatenating the sring of the training path with the image file and number. I read this image using imread() and then perform the Histogram of Gradients (HoG) template using the very helpful vl_hog function. I used the feature_param struct's cell size and template size as parameters. I could not actually test if this worked until the end to verify my positive features weas working. The purpose overall of this function is to load positively trained faces for later use.

Below I have a code snipet showing how I accessed different images by cacatenating the path..


	%Create the path using sstring concatenatio
	path = strcat(train_path_pos ,'/', image_files(ima).name);

	%read the image
	image = imread(path);

After this I worked on the random negative feature. Like the positive features I first iterated through the number of features. This time however I took a random number amount of them from a directory. I then read and convert them to gray scale, and then scaled the images to the new size. I then proceeded to calculate row and col used later on in the histogram of gradients calculation.I got the histogram of gradients of the randomly sampled negative features again with the vl_hog function. The purpose of this function is to mainly sample no faces to help with classifier training later on. To test to see if the negative feature sampling worked I had to finish the other parts in order to test it

Below I have a code snipet showing how I created the histogram of gradients.


	%perform the hog calculations

	template = vl_hog( single( newImage(row : row + feature_params.template_size - 1, ccol : ccol + feature_params.template_size - 1)), feature_params.hog_cell_size);

	%store in feature_meg

	features_neg(number, :) = reshape(template, 1, (feature_params.template_size /feature_params.hog_cell_size)^2 * 31);

Now that I had finished calculating the features, I proceeded onto the classifier training. To do this I first transposed the parameters for posive features and negative features. I stored the size of the features and then created the classif matrix which I used to hold the labels assigned. The bulk of the classification workload is done by the vl_svmtrain which stores the [w,b] which is actually used in run_detector. The classifier training took quite long to accomplish as it had to go through the Caltech web faces and the Wu et al. and the SUN scene database. I thing I had to balance for this part was the size of the training data with regards with accuracy. While I noticed that a larger training data did indeed increase the accuracy of my results, it also took a lot more time to run in regards. In order to cut down on time I limited my training data to help test my different methods.

Here is a example of a snippet of code I used to train the data with the help of the vl_svmtrain functiont


[w,b] = vl_svmtrain([pos_feats, neg_feats], classif, 0.0001);

For the last part of the project I worked on the run detector method. This method took the most time to accomplish. I first step through the test images coverting them to teh Historgram of gradients space with the help of the function vl_hog at each scale. From there I stepped over the cells taking cells of the same size and classifying them. If they pass the default confidence value then I passed all the detections to a non-maximal suppression with the help of non_max_supr_bbox function. Some decisions going into the code however was the step size needed to run which I initially defaulted to the pixel width of the histogram of gradients cells. The step size acrosss scales also took a part which I gave the default values too. AFter this method was finished I was able to test my previous methods. The overall purpose of this function was mainly to run the classifer on the test set. For each image will be called over multiple scales and then non-maximal suppression to remove any duplicate detection.

Here is a code snippet showing how I did the nonmaximal suppression. The nonmaximal suppression actually took quite long afterwards after many test cases but overall the code was quite sound.


	%non_max_supr_bbox is run for nonmaximmal suppression

	[maxi] = non_max_supr_bbox(box, conf, size(img));

   conf = conf(maxi,:);
   box      = box(     maxi,:);
   imID   = imID(  maxi,:);

Graphs displaying

Graph of average precision

Plot of Figure 8 for Viola Jones

Hog Template