Project 5 / Scene Recognition with Bag of Words

image

Project is about face recognition. It is divided into three devisions. First is the getting the data part and second is the training part and third is the classification part .Here experiment is done with hog feature and without it. Multiple scales are tried. Furthermore, hard negative mining method is tried to check the increase in the accuracy.

  1. Get positive image feature
  2. Get positive image feature
  3. SVM classifier
  4. Testing data
  5. Multiple scales
  6. Hard negative mining

  1. Get positive image feature
  2. Here in get positive images, we have taken images from the file and extracted hog features from them. There is no need to reduce the size as we have taken because the original image size is give as 36*36. Here we use vl_hog to extract hog features. The they are flattened and saved in fatures_pos.

    
    for i= 1: num_images
        img=imread(strcat(train_path_pos,'/',image_files(i).name)); 
        hog = vl_hog(im2single(img), CELLSIZE); 
        hog_new=reshape(hog,[1,(feature_params.template_size / feature_params.hog_cell_size)^2 * 31]);
        features_pos=[features_pos;hog_new];   
    end
    
    

  3. Get negative image feature
  4. Here in get negative images i.e the images which doesnt contain faces in it. Thus we extract hog features from them. Here the size of the images is not even thus we take random images and crop to the size of the template. Here aswell we use vl_hog to extract hog features. Then they are flattened and saved in negative_pos.

    
    num_crops=floor(num_samples / num_images);
    random_number = randi(274,1,num_samples);
    for i= 1: num_samples
        img=imread(strcat(non_face_scn_path,'/',image_files(random_number(1,i)).name));  
        img=rgb2gray(single(img));    
        [n m]= size(img);
        L = feature_params.template_size;
        img_crop = img(randi(n-L+1)+(0:L-1),randi(m-L+1)+(0:L-1));  
        hog = vl_hog((img_crop), CELLSIZE);
        hog_new=reshape(hog,[1,(feature_params.template_size / feature_params.hog_cell_size)^2 * 31]);
        features_neg=[features_neg;hog_new];
    end
    

  5. SVM classifier
  6. Here we use SVM classifier to classify the data. The positive and negative training data is then used as parameter in vl_svmtrain. LAMDA is tuned to 0.0001 for best performance. We get 'w' and 'b'.

    
    X=[features_pos;features_neg];
    Y=[ones(6713,1);-1*ones(10000,1)];
    lambda=0.0001;
    [w b] = vl_svmtrain(X', Y, lambda);  
    

  7. Testing data
  8. Here we take test image and extract hog features on it. Then we iterative sliding window of length cell size on the test image hog fetures. On each of window we classify with svm and check the confidence. If it is above a perticular limit the we pass it furhter in the pipeline. After sliding window ends we do maximum supression and get confidences and boxes.

    
    for z = 1:length(test_scenes)    
        for scale = 1: length(scales)   
            img=imresize(img, scales(scale));
            hog = vl_hog(im2single(img), CELLSIZE);
            for i=1: size(hog,1)-CELLSIZE
                for j=1:size(hog,2)-CELLSIZE
                    hog_patch=hog(i:i+CELLSIZE-1,j:j+CELLSIZE-1,:);     
                    hog_new=reshape(hog_patch,[1,(feature_params.template_size /
    		 feature_params.hog_cell_size)^2 * 31]);
                    threshold=hog_new*w + b;
                    if threshold > 0.25
                        cur_bboxes = [cur_bboxes;cur_x_min*CELLSIZE*scales(scale), 
    			cur_y_min*CELLSIZE*scales(scale), (cur_x_min + CELLSIZE)*CELLSIZE*scales(scale),
    			 (cur_y_min + CELLSIZE)*CELLSIZE*scales(scale)];                   
                        cur_confidences=[cur_confidences;threshold];
                        cur_image_ids = [cur_image_ids;{test_scenes(z).name}];
    		    flag ==1	  
        
        if flag==1
            [is_maximum] = non_max_supr_bbox(cur_bboxes, cur_confidences, size(img1));
            if size(is_maximum,1) ~=0 
                cur_confidences = cur_confidences(is_maximum,:);
                cur_bboxes      = cur_bboxes(     is_maximum,:);
                cur_image_ids   = cur_image_ids(  is_maximum,:);
    
                bboxes      = [bboxes;      cur_bboxes];
                confidences = [confidences; cur_confidences];
                image_ids   = [image_ids;   cur_image_ids];
            end
        end    
    end
    

  9. Multiple scales
  10. We have considered three scale of 0.9 and its powers. Thus after all the scales are done we do maximum supression. Futhermore, we need to multiply by scale while drawing boxes.

  11. Hard negative mining
  12. In hard negative mining, we have to extract positive and negative training features. Clasify them and get w and b values from svm train. We then pass negative images in the detector program to get some false positives. We take that feature points of these false positives and add them as negative in our training data features. And we train the classifier again and give it to detector to find actual test images.

    
    for z = 1:length(test_scenes)
            hog = vl_hog(im2single(img), CELLSIZE);
            for i=1: size(hog,1)-CELLSIZE
                for j=1:size(hog,2)-CELLSIZE
                    hog_patch=hog(i:i+CELLSIZE-1,j:j+CELLSIZE-1,:);     
                    hog_new=reshape(hog_patch,[1,(feature_params.template_size 
    					/ feature_params.hog_cell_size)^2 * 31]);
                    threshold=hog_new*w + b;
                    if threshold > 0.5
                        cur_bbox = [cur_x_min*CELLSIZE*scales(scale), cur_y_min*CELLSIZE*scales(scale), 
    		(cur_x_min + CELLSIZE)*CELLSIZE*scales(scale), (cur_y_min + CELLSIZE)
    		*CELLSIZE*scales(scale)];
                        hog_for_further_use=[hog_for_further_use;hog_new];
    
    

    Face template HoG visualization.

    Example of detection on the test set from the starter code.