This project involved scene recognition using 2 algorithms for image input representation (tiny images and SIFT) and two algorithms for classification (Nearest Neighbors and Support Vector Machines). I used different combinations of the input representation and classification algorithms to improve scene recognition accuracy. They are as follows:
To create the tiny image representation, I simply scaled the input images to 16x16 representations. To implement nearest neighbors, I simply measured the distance between the training set and testing set accross all dimensions and chose the closest neighbor (k=1). I saw an accuracy of 19.1%
%Tiny Images
for i=1:size(image_paths, 1)
pic = imread(image_paths{i});
tiny_pic = reshape(imresize(pic, [RES, RES]), [d, 1]);
image_feats(i,:) = tiny_pic;
end
%Nearest Neighbor
D = vl_alldist2(train_image_feats.',test_image_feats.');
[sorted, idx] = sort(D);
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.050 | InsideCity |
LivingRoom |
Highway |
Coast |
||||
Store | 0.020 | InsideCity |
Office |
Coast |
Forest |
||||
Bedroom | 0.090 | Office |
Industrial |
Coast |
InsideCity |
||||
LivingRoom | 0.070 | Store |
Suburb |
Highway |
Mountain |
||||
Office | 0.050 | Kitchen |
Industrial |
Bedroom |
TallBuilding |
||||
Industrial | 0.020 | Office |
InsideCity |
Highway |
Forest |
||||
Suburb | 0.200 | Store |
OpenCountry |
Highway |
|||||
InsideCity | 0.110 | Industrial |
Coast |
Coast |
Coast |
||||
TallBuilding | 0.110 | Office |
Store |
Industrial |
Coast |
||||
Street | 0.390 | TallBuilding |
Bedroom |
Bedroom |
Highway |
||||
Highway | 0.680 | Mountain |
Industrial |
Coast |
Coast |
||||
OpenCountry | 0.310 | Mountain |
Highway |
Coast |
Mountain |
||||
Coast | 0.290 | Suburb |
InsideCity |
Forest |
OpenCountry |
||||
Mountain | 0.130 | Forest |
Forest |
Highway |
OpenCountry |
||||
Forest | 0.350 | Street |
Street |
Mountain |
Highway |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |
To implement SIFT features, I fist built the vocabulary by clustering the sift features from each image into the specified number of clusters using k-means. Then for each train and test image to create its feature space, created a histogram to represent how well the SIFT features of an image alligned with the k-means clusters. So each image is represented by a histogram of cluster centers.
I manipulated the sampling to produce different accuracies for kNN and the SVMs. I found that sampling with the same frequency as what was used to create the vocabulary produced a higher accuracy than sampling more sparsely.
for i=1:size(image_paths, 1)
[loc, features] = vl_dsift(single(pic), 'fast','step',16);
D = vl_alldist2(double(features), vocab.');
[sorted, idx] = sort(D.');
centers = idx(1,:);
N = hist(centers.', vocab_size);
image_feats(i, :) = N./norm(N);
end
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.280 | InsideCity |
LivingRoom |
Bedroom |
Bedroom |
||||
Store | 0.330 | Bedroom |
Industrial |
Forest |
Street |
||||
Bedroom | 0.270 | InsideCity |
Kitchen |
Office |
Kitchen |
||||
LivingRoom | 0.240 | Industrial |
Industrial |
InsideCity |
Kitchen |
||||
Office | 0.690 | Kitchen |
LivingRoom |
Kitchen |
Bedroom |
||||
Industrial | 0.260 | Street |
Coast |
Mountain |
TallBuilding |
||||
Suburb | 0.760 | Mountain |
Coast |
Coast |
Highway |
||||
InsideCity | 0.350 | Coast |
TallBuilding |
Kitchen |
Kitchen |
||||
TallBuilding | 0.400 | Mountain |
Mountain |
Industrial |
Industrial |
||||
Street | 0.420 | TallBuilding |
InsideCity |
Suburb |
LivingRoom |
||||
Highway | 0.750 | Street |
Street |
OpenCountry |
Suburb |
||||
OpenCountry | 0.430 | Store |
Mountain |
Forest |
Mountain |
||||
Coast | 0.540 | OpenCountry |
Office |
OpenCountry |
Industrial |
||||
Mountain | 0.460 | Coast |
Street |
Highway |
Forest |
||||
Forest | 0.900 | OpenCountry |
OpenCountry |
OpenCountry |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |
I also implemented kNN with varying k and found that with k=20, I was able to achieve an accuracy of 50.1%
k = 20;
top_k = train_labels(idx(1:k,:));
predicted_categories = train_labels(idx(1,:));
for i=1:size(top_k,2)
mode = str_mode(top_k(:,i));
predicted_categories{i} = mode;
end
To create a sequence of Support Vector Machines, I iterated through the possible categories and created a binary classification problem for each SVM rather than a multiclass one. To find the class which the testing image most probably belonged, I used the individual SVMs in conjunction with one another and classified the testing image with the SVM which produced the highest confidence measure.
for i=1:num_categories
match_idx = strcmp(categories(i), train_labels);
category_labels = repmat(-1,size(train_labels, 1), 1);
category_labels(match_idx) = 1;
[w, b] = vl_svmtrain(train_image_feats.', category_labels, .0001);
W(i,:) = w.';
B(i,1) = b;
end
scores = W*test_image_feats.';
scores = scores + repmat(B, 1, size(scores,2));
Manipulating lambda for each SVM, I saw the highest accuracy with lambda = 0.0001.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.420 | Store |
LivingRoom |
Office |
Office |
||||
Store | 0.310 | Highway |
InsideCity |
Forest |
Highway |
||||
Bedroom | 0.350 | LivingRoom |
OpenCountry |
Office |
Kitchen |
||||
LivingRoom | 0.300 | Office |
TallBuilding |
Highway |
Kitchen |
||||
Office | 0.790 | Store |
Kitchen |
LivingRoom |
LivingRoom |
||||
Industrial | 0.360 | Bedroom |
Bedroom |
InsideCity |
InsideCity |
||||
Suburb | 0.870 | Industrial |
OpenCountry |
InsideCity |
OpenCountry |
||||
InsideCity | 0.430 | Street |
Industrial |
Forest |
Kitchen |
||||
TallBuilding | 0.690 | InsideCity |
InsideCity |
Street |
Industrial |
||||
Street | 0.510 | InsideCity |
InsideCity |
Store |
InsideCity |
||||
Highway | 0.770 | Street |
Industrial |
Store |
Mountain |
||||
OpenCountry | 0.410 | Industrial |
Forest |
Highway |
Coast |
||||
Coast | 0.740 | OpenCountry |
OpenCountry |
Suburb |
OpenCountry |
||||
Mountain | 0.750 | Forest |
OpenCountry |
Suburb |
TallBuilding |
||||
Forest | 0.910 | OpenCountry |
OpenCountry |
LivingRoom |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |