Project 4 / Scene Recognition with Bag of Words

Project description

The purpose of this project is to perform image recognition based on machine learning knowledge. There are two kinds of features used to represent image information: tiny images and SIFT features. On the other hand, there are two classifier used to train the data: nearest neighbour and linear SVM.

  1. Tiny images + nearest neighbour
  2. SIFT features + nearest neighbour
  3. SIFT features + linear SVM

1, Tiny images + nearest neighbour

Tiny image feature resizes the input images into 16x16 small ones. Then, the K-nearest-neighbour algorithm is used to classify the image. In the algorithm, the assigning label for each testing image is determined by its neighbouring labels. The value k means how much neighbours are chosen to affect the center label, in which the one with highest frequency would be picked to be the label. The algorithm and results are shown below:


K = 5;
D = vl_alldist2(train_image_feats', test_image_feats');
% [N, d] = size(train_image_feats);
[M, ~] = size(test_image_feats);
predicted_categories = cell(M, 1);
labelsets = unique(train_labels);
[N, ~] = size(labelsets);
for i = 1:M
    X = D(:, i);
    [Y, I] = sort(X);
    labels = train_labels(I(1:K, :));
    buffer = zeros(N, 1);
    for current = 1:numel(labels)
        matching_indices = strcmp(labels{current}, labelsets);
        buffer(matching_indices) = buffer(matching_indices) + 1;
    end
    [~, most_freq] = max(buffer);
    most_freq_label = labelsets(most_freq);
    predicted_categories(i, 1) = most_freq_label;
end

Results of Tiny images + nearest neighbour

K value Accuracy
1 0.225
3 0.219
5 0.215
10 0.213

Results visualization for poorly performing recognition pipeline.


Accuracy (mean of diagonal of confusion matrix) is 0.215 for k = 5

2, SIFT features + nearest neighbour

The second part of the project is to use SIFT features to represent the image. The process include that to build a vocabulary according to the cluster found in training data. The MATLAB built-in method vl_kmeans was used


[N, ~] = size(image_paths)
buffer = [];
for i = 1:N
    current = imread(image_paths{i, 1});
    ok = i
    [~, SIFT_features] = vl_dsift(single(current),'step', 10);
    buffer = [buffer single(SIFT_features)];
end
[centers, ~] = vl_kmeans(buffer, vocab_size);
vocab = centers;

Then, applying the SIFT features by building frequency histogram, the KNN algorithm again was applied to the testing set. The building of SIFT features was done by vl_dsift function in MATLAB. After that, making k = 5 in order to build frequency histogram, the algorithm build a histogram according to the most nearest neighbour for each image descriptor.


Accuracy (mean of diagonal of confusion matrix) is 0.547

3, SIFT features + linear SVM

Finally, the project applied linear SVM with the SIFT features to train the dataset. The build-in function vl_svmtrain was used.


categories = unique(train_labels); 
num_categories = length(categories);
[M, d] = size(test_image_feats);
[N, d] = size(train_image_feats);
buffer = zeros(num_categories, M);
LAMBDA = 0.00001;
max_num_of_iter = 1000000;
for i = 1:num_categories
    tempt = -ones(1, N);
    tempt(strcmp(categories{i}, train_labels)) = 1;
    [W B] = vl_svmtrain(train_image_feats', tempt, LAMBDA, 'MaxNumIterations', max_num_of_iter);
    for j = 1:M
        buffer(i, j) = W'* test_image_feats(j, :)' + B;
    end
end
[~, idx] = max(buffer);
predicted_categories = categories(idx);

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.672

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.600
Bedroom

Bedroom

Office

Bedroom
Store 0.490
Bedroom

OpenCountry

Mountain

Kitchen
Bedroom 0.480
Kitchen

Kitchen

Kitchen

Store
LivingRoom 0.440
TallBuilding

Bedroom

Industrial

Office
Office 0.880
TallBuilding

Kitchen

LivingRoom

LivingRoom
Industrial 0.500
Store

InsideCity

LivingRoom

Store
Suburb 0.940
Industrial

Mountain

OpenCountry

InsideCity
InsideCity 0.570
Store

Store

Kitchen

Kitchen
TallBuilding 0.750
Highway

Street

Industrial

Coast
Street 0.650
Store

OpenCountry

TallBuilding

InsideCity
Highway 0.770
Coast

OpenCountry

LivingRoom

Mountain
OpenCountry 0.540
Highway

Suburb

Highway

Store
Coast 0.730
OpenCountry

OpenCountry

OpenCountry

OpenCountry
Mountain 0.810
OpenCountry

Coast

Coast

TallBuilding
Forest 0.930
TallBuilding

Store

Mountain

Store
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label