Confusion Matrix.
For features my project I implemented normalized zero mean tin image features and Bag of SIFT features. For the Bag of SIFT I implemented a standard kmeans 200 word vocabulary, but I also implemented a Fisher features using a Gaussian Mixture Model. In addition to that I added the calculation of a GIST feature to the final Fisher feature to further improve accuracy. Finally I also use 15 linear SVM classifiers to do a more accurate classification of the categories. The rest of the write up will be organized in the following way:
My implementation of Tiny image features is fairy standard as I just use imresize to shrink the image to a 16x16 patch. However, I also change the 256 dimension array to be zero mean and then normalized it. This improved accuracy from 19.1% to 22.5%.
My implementation of the Nearest neighbor classifier was simple as well. I simply used vl_alldist2 and min to find the 1 nearest neighbor. I only did 1-NN as I was already getting 22.5% accuracy with Tiny Images and 52.3% accuracy with Bag of SIFT.
My implementation of the Bag of SIFT features was more complicated. I have two versions of my implementation. One version uses vl_kmeans to create a 200 word vocabulary that I use vl_alldist2 to match features to. This is for use with the 1-NN classifier as the other version has such a high dimension it makes 1-NN too slow and inaccurate. The other version is to use vl_gmm to create a Gaussian Mixture Model for use in creating a Fisher encoding vector. After constructing the fisher vector I also create a GIST feature for the image using the code linked to in the assignment description. The code for my Bag of SIFT is below:
for i = 1 : dim(1)
% read image
img = imread(char(image_paths(i)));
gray = single(mat2gray(img));
% make SIFT features
[locations, SIFT_features] = vl_dsift(gray, 'step', 10, 'fast');
if(fisher)
encoding = vl_fisher(single(SIFT_features), MEANS, COVARIANCES, PRIORS, 'Normalized');
[gist, p] = LMgist(img, '', param);
ret(i,1 : 51200) = encoding(:);
ret(i, 51201: 51208) = gist';
else
dist = vl_alldist2(vocab, single(SIFT_features));
[confidence, index] = min(dist);
hist = zeros(1, vocab_size);
for j = 1 : size(SIFT_features, 2)
hist(1,index(j)) = hist(1,index(j)) + 1;
end
hist = hist/norm(hist(:));
ret(i,:) = hist(:);
end
end
When adding the Fisher encoding and GIST feature my accuracy jumped from 59.1% to 70.3%.
My implentation of the linear SVM classifier was also fairly standard. I run vl_svmtrain 15 times to train SVMs for each category. To optimize the code, instead of immeadiatly running the trained SVM on the images I instead train all 15 SVM and save their learned parameters in an array. I then run all 15 SVMs on each image picking the maximally responding SVM as the category for that image. The code for saving all of the SVMs in an array is shown below:
s = size(train_image_feats, 2)+ 1;
svms = zeros(num_categories, s);
for i = 1 : num_categories
labels = double(strcmp(categories(i), train_labels));
labels(labels == 0) = -1;
[W B] = vl_svmtrain(train_image_feats', labels', 0.00001);
svms(i, 1) = B;
svms(i, 2:s) = W;
end
Type | Accuracy | Total time | Time for features | Time for classifier |
---|---|---|---|---|
Tiny Image - 1-NN | 22.5% | 18.9374 sec | 13.2347 sec | 5.7027 sec |
Bag of SIFT - 1-NN | 52.3% | 3.5165 min | 3.43196667 min | 4.99244 sec |
Bag of SIFT - Lin. SVM | 70.3% | 7.61695 min | 6.67193333 min | 56.7012 sec |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.580 | TallBuilding |
Bedroom |
Office |
Bedroom |
||||
Store | 0.770 | Kitchen |
Industrial |
InsideCity |
Industrial |
||||
Bedroom | 0.500 | LivingRoom |
LivingRoom |
LivingRoom |
LivingRoom |
||||
LivingRoom | 0.340 | Bedroom |
Bedroom |
Industrial |
Office |
||||
Office | 0.920 | Bedroom |
Store |
Bedroom |
LivingRoom |
||||
Industrial | 0.530 | LivingRoom |
Store |
Store |
TallBuilding |
||||
Suburb | 0.980 | Coast |
OpenCountry |
TallBuilding |
Store |
||||
InsideCity | 0.800 | Highway |
Street |
Bedroom |
TallBuilding |
||||
TallBuilding | 0.780 | InsideCity |
Suburb |
InsideCity |
Highway |
||||
Street | 0.560 | Forest |
Coast |
Highway |
Store |
||||
Highway | 0.850 | Street |
Street |
Forest |
LivingRoom |
||||
OpenCountry | 0.450 | Mountain |
Coast |
Suburb |
Coast |
||||
Coast | 0.860 | Mountain |
OpenCountry |
OpenCountry |
OpenCountry |
||||
Mountain | 0.730 | Coast |
Forest |
OpenCountry |
OpenCountry |
||||
Forest | 0.900 | Mountain |
OpenCountry |
Store |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |