Project 4 / Scene Recognition with Bag of Words

This project involved scene recognition using 2 algorithms for image input representation (tiny images and SIFT) and two algorithms for classification (Nearest Neighbors and Support Vector Machines). I used different combinations of the input representation and classification algorithms to improve scene recognition accuracy. They are as follows:

  1. Tiny Images + Nearest Neighbors
  2. SIFT Features + Nearest Neighbors
  3. SIFT Features + SVMs

Tiny Images + Nearest Neighbors

To create the tiny image representation, I simply scaled the input images to 16x16 representations. To implement nearest neighbors, I simply measured the distance between the training set and testing set accross all dimensions and chose the closest neighbor (k=1). I saw an accuracy of 19.1%


%Tiny Images
for i=1:size(image_paths, 1)
   pic = imread(image_paths{i});
   tiny_pic = reshape(imresize(pic, [RES, RES]), [d, 1]);
   image_feats(i,:) = tiny_pic;
end

%Nearest Neighbor
D = vl_alldist2(train_image_feats.',test_image_feats.');
[sorted, idx] = sort(D);

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.191

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.050
InsideCity

LivingRoom

Highway

Coast
Store 0.020
InsideCity

Office

Coast

Forest
Bedroom 0.090
Office

Industrial

Coast

InsideCity
LivingRoom 0.070
Store

Suburb

Highway

Mountain
Office 0.050
Kitchen

Industrial

Bedroom

TallBuilding
Industrial 0.020
Office

InsideCity

Highway

Forest
Suburb 0.200
Store

OpenCountry

Highway
InsideCity 0.110
Industrial

Coast

Coast

Coast
TallBuilding 0.110
Office

Store

Industrial

Coast
Street 0.390
TallBuilding

Bedroom

Bedroom

Highway
Highway 0.680
Mountain

Industrial

Coast

Coast
OpenCountry 0.310
Mountain

Highway

Coast

Mountain
Coast 0.290
Suburb

InsideCity

Forest

OpenCountry
Mountain 0.130
Forest

Forest

Highway

OpenCountry
Forest 0.350
Street

Street

Mountain

Highway
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

SIFT Features + Nearest Neighbors

To implement SIFT features, I fist built the vocabulary by clustering the sift features from each image into the specified number of clusters using k-means. Then for each train and test image to create its feature space, created a histogram to represent how well the SIFT features of an image alligned with the k-means clusters. So each image is represented by a histogram of cluster centers.

I manipulated the sampling to produce different accuracies for kNN and the SVMs. I found that sampling with the same frequency as what was used to create the vocabulary produced a higher accuracy than sampling more sparsely.


for i=1:size(image_paths, 1)
    [loc, features] = vl_dsift(single(pic), 'fast','step',16);
    D = vl_alldist2(double(features), vocab.');
    [sorted, idx] = sort(D.');
    centers = idx(1,:);
    N = hist(centers.', vocab_size);
    image_feats(i, :) = N./norm(N);
end

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.472

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.280
InsideCity

LivingRoom

Bedroom

Bedroom
Store 0.330
Bedroom

Industrial

Forest

Street
Bedroom 0.270
InsideCity

Kitchen

Office

Kitchen
LivingRoom 0.240
Industrial

Industrial

InsideCity

Kitchen
Office 0.690
Kitchen

LivingRoom

Kitchen

Bedroom
Industrial 0.260
Street

Coast

Mountain

TallBuilding
Suburb 0.760
Mountain

Coast

Coast

Highway
InsideCity 0.350
Coast

TallBuilding

Kitchen

Kitchen
TallBuilding 0.400
Mountain

Mountain

Industrial

Industrial
Street 0.420
TallBuilding

InsideCity

Suburb

LivingRoom
Highway 0.750
Street

Street

OpenCountry

Suburb
OpenCountry 0.430
Store

Mountain

Forest

Mountain
Coast 0.540
OpenCountry

Office

OpenCountry

Industrial
Mountain 0.460
Coast

Street

Highway

Forest
Forest 0.900
OpenCountry

OpenCountry

OpenCountry

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

I also implemented kNN with varying k and found that with k=20, I was able to achieve an accuracy of 50.1%


k = 20;
top_k = train_labels(idx(1:k,:));
predicted_categories = train_labels(idx(1,:));
for i=1:size(top_k,2)
    mode = str_mode(top_k(:,i));
    predicted_categories{i} = mode;
end

SIFT Features + SVMs

To create a sequence of Support Vector Machines, I iterated through the possible categories and created a binary classification problem for each SVM rather than a multiclass one. To find the class which the testing image most probably belonged, I used the individual SVMs in conjunction with one another and classified the testing image with the SVM which produced the highest confidence measure.


for i=1:num_categories
    match_idx = strcmp(categories(i), train_labels);
    category_labels = repmat(-1,size(train_labels, 1), 1);
    category_labels(match_idx) = 1;
    [w, b] = vl_svmtrain(train_image_feats.', category_labels, .0001);
    W(i,:) = w.';
    B(i,1) = b;
end

scores = W*test_image_feats.';
scores = scores + repmat(B, 1, size(scores,2));

Manipulating lambda for each SVM, I saw the highest accuracy with lambda = 0.0001.

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.574

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.420
Store

LivingRoom

Office

Office
Store 0.310
Highway

InsideCity

Forest

Highway
Bedroom 0.350
LivingRoom

OpenCountry

Office

Kitchen
LivingRoom 0.300
Office

TallBuilding

Highway

Kitchen
Office 0.790
Store

Kitchen

LivingRoom

LivingRoom
Industrial 0.360
Bedroom

Bedroom

InsideCity

InsideCity
Suburb 0.870
Industrial

OpenCountry

InsideCity

OpenCountry
InsideCity 0.430
Street

Industrial

Forest

Kitchen
TallBuilding 0.690
InsideCity

InsideCity

Street

Industrial
Street 0.510
InsideCity

InsideCity

Store

InsideCity
Highway 0.770
Street

Industrial

Store

Mountain
OpenCountry 0.410
Industrial

Forest

Highway

Coast
Coast 0.740
OpenCountry

OpenCountry

Suburb

OpenCountry
Mountain 0.750
Forest

OpenCountry

Suburb

TallBuilding
Forest 0.910
OpenCountry

OpenCountry

LivingRoom

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label