This project introduces various recognition techniques for images. This is the beginning of the machine learning module, wherein a model for classification is generated through learning from example, annotated data, wherein the ground truth is known and we attempt to train a model that is as accurate to the annotated data as possible.
The basic process of scene recognition is getting the features of training data, creating a model from the labeled images, and applying this model to trying and predict new data (images) and classifying the data under certain categories.
This project creates features utilizing two methods, a tiny image representation wherein the image pixels are scaled down, and a bag of words representation, where SIFT descriptors are clustered and compared with training data. Additionally, this project also explores two classifier prediction models, the KNN and linear SVM techniques. KNN is a clustering technique within K most similiar images of the training data, and SVM is a classifier that creates a boundary plane upon a hyperplane dimension of the data.
Each image is taken and then downsized. This is used as the features with other classifiers, such as KNN once they become classified.
image_feats = [];
for i = 1:length(image_paths)
image = imread(image_paths{i});
image = imresize(image,[16 16]);
image = reshape(image,1,256);
image_feats(i,:) = image;
end
For this portion, a vocabulary is built to represent the training and testing image datasets, getting a bag of feature histogram. SIFT extracts features and clusters them. Vocabcular word size was 200.
In the code, an image is read in, and th VL dense sift featureing was used. Sift them randomly selects a set of features.
imageLength = length(image_paths);
SIFT = [];
for i = 1:imageLength
image = imread(image_paths{i});
image = single(image);
[locations, SIFT_features] = vl_dsift(image,'fast','step',10);
randomPermutations = randperm(size(SIFT_features,2));
SIFT = [SIFT SIFT_features(:,randomPermutations(1:20))];
end
SIFT = single(SIFT);
[vocab, assigned] = vl_kmeans(SIFT, vocab_size);
Here, in the bag of sift file, we read in an image, apply SIFT, and then get the distances and clustering to generate the model in order to try classifying test data.
N = size(image_paths, 1);
image_feats = zeros(N, vocab_size);
for i = 1: N
image = imread(image_paths{i});
[location, sift_feats] = vl_dsift(single(image), 'fast', 'step', 5);
all_dist = vl_alldist2(single(sift_feats), vocab);
[sorted, index] = sort(all_dist, 2);
histogram = zeros(1, vocab_size);
for j = 1:size(index)
histogram(index(j, 1:5)) = histogram(index(j, 1:5))+[10 6 3 2 1];
end
histogram = histogram/norm(histogram,2);
image_feats(i,:) = histogram;
KNN deals in getting the nearest neighbors and then clustering them. Getting distances is quite straightforward through utilizing VLfeat.
D = vl_alldist2(test_image_feats',train_image_feats');
[B,I] = sort(D,2);
permutations = I(:,1);
predicted_categories = train_labels(permutations);
An SVM is a classifier than attempts to explode the data points to a higher dimension, and create a boundary line in this higher dimension (allowing for a more complex line), after which data/line is reduced to the normal dimension but keeping the complex linear seperator line. In this case, the line separates images within or ouside a category.
In this code, we get all the labels, and use VLfeat in order to do the SVM process. We choose the line of best confidence given the results.
categories = unique(train_labels);
num_categories = length(categories);
lambda = 0.3;
W = [];
B = [];
for i=1:num_categories
correctLabel = strcmp(categories(i),train_labels);
binaryLabel = ones(size(correctLabel)) .* -1;
binaryLabel(correctLabel) = 1;
[w,b] = vl_svmtrain(train_image_feats',binaryLabel',lambda);
W(i,:) = w;
B(i,:) = b;
end
for i=1:size(test_image_feats);
conf = [];
for j=1:num_categories
conf(j,:) = dot(W(j,:),test_image_feats(i,:)) + B(j,:);
end
[maxConf,index] = max(conf);
predicted_categories(i) = categories(index);
end
predicted_categories = predicted_categories';
Lambda = .0003 |
Lambda = .003 |
Lambda = .03 |
Lambda = .3 |
Accuracy = 0.661 |
Accuracy = 0.561 |
Accuracy = 0.525 |
Accuracy = 0.408 |
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.520 | Bedroom |
Store |
Store |
InsideCity |
||||
Store | 0.390 | LivingRoom |
Industrial |
Industrial |
LivingRoom |
||||
Bedroom | 0.320 | LivingRoom |
LivingRoom |
LivingRoom |
LivingRoom |
||||
LivingRoom | 0.360 | Bedroom |
Kitchen |
Bedroom |
Bedroom |
||||
Office | 0.990 | Kitchen |
LivingRoom |
Kitchen |
|||||
Industrial | 0.410 | Bedroom |
LivingRoom |
Store |
InsideCity |
||||
Suburb | 0.970 | Industrial |
OpenCountry |
Street |
InsideCity |
||||
InsideCity | 0.690 | Street |
LivingRoom |
Street |
LivingRoom |
||||
TallBuilding | 0.770 | LivingRoom |
Industrial |
Kitchen |
InsideCity |
||||
Street | 0.690 | InsideCity |
Suburb |
InsideCity |
Mountain |
||||
Highway | 0.750 | Store |
Industrial |
OpenCountry |
Mountain |
||||
OpenCountry | 0.460 | Highway |
Highway |
Mountain |
Coast |
||||
Coast | 0.770 | Highway |
OpenCountry |
Highway |
Suburb |
||||
Mountain | 0.870 | OpenCountry |
Highway |
Store |
Suburb |
||||
Forest | 0.960 | OpenCountry |
Store |
Mountain |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |
Lambda has a noticeable impact on accuracy. A larger lambda resulted in much worse score. I suspect excessively smaller score would be good up until a point, from which accuracy plateaus or falls (as such models tend to do).
KNN is a straightforward algorithm wherein the results don't necessarily change/are randomized given a configuration, so multiple runs with knn do not result in changed accuracies.
As expected, running a placeholder feature/classifier situation leads to low scoring, randomized accuracies.
Accuracies followed as predicted from the proj4 homework outline.