This is the simplest approach. The features are input images scaled down to 16x16 pixel and then zero meaned and normalized to a unit vector.
image = imresize(image, [image_size image_size]);
image = image(:);
% zero mean and unit length
image = double(image);
image = image / sum(image);
image = image - mean(image);
image_feats(i,:) = image;
By calculating the K-Nearest Neighbors for a test image to the training images and using the most common label of the neighbors we can predict the label of the test image. Using this approach with K=1 I got 0.199 accuracy.
K = 1;
[D, I] = pdist2(train_image_feats, test_image_feats, 'euclidean','Smallest',K);
if K > 3
I = mode(I);
end
predicted_categories = train_labels(I);
By extracting features using SIFT rather than just miniturizations the accuracy will improve further. This is done in two steps.
First we build a vocabulary of less dense SIFT features from all our training images. These are then clustered into K clusters.
FEATURE_STEP_SIZE=5;
FEATURE_SIZE=3;
all_features = [];
for i=1:size(image_paths)
path = image_paths{i};
image = single(imread(path));
[positions, features] = vl_dsift(image, 'fast', 'Step', FEATURE_STEP_SIZE, 'Size', FEATURE_SIZE);
% features is d x N
all_features = [all_features, features];
end
[centers, assignments] = vl_kmeans(single(all_features), vocab_size);
vocab = centers';
K here is the given parameter vocab_size. I use a FEATURE_STEP_SIZE=5 and FEATURE_SIZE=3 this gives a balance between speed and accuracy.
The next step is to extract dense featurs from the test image and their nearest neighbors in amongst the training image vocabulary.
path = image_paths{i};
image = single(imread(path));
[positions, features] = vl_dsift(image, 'Fast', 'Step', FEATURE_STEP_SIZE, 'Size', FEATURE_SIZE);
features = double(features);
[D, I] = pdist2(vocab, features', 'euclidean','Smallest', 1);
% [I, D] = vl_kdtreequery(forest, vocab', features);
% create histogram
hist = zeros([vocab_size 1]);
for j=1:size(I, 2)
cluster = I(j);
hist(cluster) = hist(cluster) + 1;
end
hist = hist / sum(hist);
image_feats = [image_feats, hist];
Experimentally I found that FEATURE_STEP_SIZE=5 and FEATURE_SIZE=3 gave a good balance between speed and accuracy. This yields an accuracy of 0.509.
This is the final improvement using 1 vs all linear SVM classifiers. That is, for each of the 15 categories a SVM is trained to classify between "category X" and "not category X". Then for each test image all SVM give their prediction and the most certain result is used.
% build classifiers
W = [];
B = [];
for i=1:num_categories
matching_indices = strcmp(categories(i), train_labels);
labels = ones([N 1]) * -1;
labels(matching_indices) = 1;
labels = double(labels);
[w b] = vl_svmtrain(train_image_feats', labels, LAMBDA);
W = [W, w];
B = [B, b];
end
% classify data
M = size(test_image_feats, 1);
predicted_categories = cell(M, 1);
for i=1:M
Y = W' * test_image_feats(i,:)' + B';
[M, label_id] = max(Y);
predicted_categories{i} = categories{label_id};
end
predicted_categories
LAMBDA = 0.000001 was experimentally showing to give the best result together with FEATURE_STEP_SIZE=2 and FEATURE_SIZE=3 when extracting the dense SIFT features from the test images. The resulting accuracy was 0.709 and the result is vizualized below. This however is a bit slow, and by setting FEATURE_STEP_SIZE=5 we decrease the time drastically but still get an acceptable accuracy of 0.68.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.670 | Store |
Bedroom |
LivingRoom |
Bedroom |
||||
Store | 0.590 | Street |
Kitchen |
Kitchen |
Mountain |
||||
Bedroom | 0.500 | Kitchen |
Kitchen |
Office |
Office |
||||
LivingRoom | 0.310 | Store |
Kitchen |
Kitchen |
Kitchen |
||||
Office | 0.910 | LivingRoom |
LivingRoom |
Kitchen |
Kitchen |
||||
Industrial | 0.560 | LivingRoom |
Bedroom |
Kitchen |
OpenCountry |
||||
Suburb | 0.980 | Mountain |
Highway |
InsideCity |
Highway |
||||
InsideCity | 0.650 | Street |
Coast |
Store |
TallBuilding |
||||
TallBuilding | 0.820 | Industrial |
Industrial |
Mountain |
Industrial |
||||
Street | 0.690 | TallBuilding |
Store |
InsideCity |
InsideCity |
||||
Highway | 0.800 | Industrial |
Industrial |
Mountain |
Store |
||||
OpenCountry | 0.520 | Coast |
Coast |
Forest |
Coast |
||||
Coast | 0.810 | OpenCountry |
OpenCountry |
Highway |
OpenCountry |
||||
Mountain | 0.820 | Highway |
Highway |
Forest |
OpenCountry |
||||
Forest | 0.950 | Store |
Store |
Store |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |