Project 4 / Scene Recognition with Bag of Words

The goal of this project was to test the use of different feature extraction/encoding methods and different classificaiton algorithms in ordeer to classify 3000 scenes from a 15 scene database.

Feature Extraction and Encoding

I implemented 2 main feature representations, the tiny image features and bag of sift features. The tiny image is simple a version of the image resized to be very small (16x16). Then the greyscale pixel values were used as features. This was not a very good representation since it discards most of the high frequency image content and is not invarient to spatial or brightness shifts. The vector was normlized and zero meaned.

For the Sift bags of words features, I used the VLFeat library to do much of the calculation. My first implementation first created sampled sift features from all the images in the training set and then used KNN to cluster the SIFT features into a "dictionary" of 50 clusters. Then for each image, the representation was a normalized histogram where each sift feature was classified as the closest "cluster". The histograms are normalized.

In addition, I also implemented a variant of the bag of SIFT features with Gaussian Mixture Model clustering. I clustered into 10 clusters with the highest likelyhoods and then generated standard Fisher vectors. These vectors are normalized and used as the representation for the image.

Image Classification Methods

I implemented 2 main methods to classify each image. The first was a K-Nearest Neighbor algorithm and the second was a linear SVM.

The implementation of our K-Nearest Neighbor's algorithm, is pretty simple. We simple find the K representation of each images in the training data which are to the closest to the test image. From our testing, we chose to use the 9 nearest neighbors. We then classified the image based on the which label had the majority of images in the top 9. I tried improving it with the Naive-Bayes NN image classifier, but was unable to achieve better results.

I used Matlab's implementation of the Linear SVM classifier. I used a 1 vs all multiclassifier where I trained 15 classifiers, one for each of the labels. With each classifier, I trained it on all the images in one label vs all the images in all the other labels. Then to classify any individual image, I ran the 15 SVMS, on image and chose whichever label, the image was the classified the farthest towards(how far away it was from the SVM hyperplane). In addition, I tried using multiple Kernels apart from the Linear SVM to improve accuracy.

The best results I got was around 64% accuracy with the Fisher Bag of Sift representation with the Linear SVM using a Radial Bias Function Kernel.

Results

Accuracy with Tiny Images & KNN:


Accuracy is 0.214
10-fold Average Cross Validation Accuracy: 0.1360
10-fold Cross Validation Standard Deviation 0.0334

Accuracy with VLAD Bag of SIFT & KNN :


Accuracy is 0.485
10-fold Average Cross Validation Accuracy: 0.3250
10-fold Cross Validation Standard Deviation 0.0486

Accuracy with VLAD Bag of SIFT & KNN with Naive-Bayes NN image classifier:


Accuracy is 0.354
10-fold Average Cross Validation Accuracy: 0.2834
10-fold Cross Validation Standard Deviation 0.0386

Accuracy with VLAD Bag of SIFT & Linear Classification SVM (Linear Kernel)

Accuracy is 0.561
10-fold Average Cross Validation Accuracy: 0.3570
10-fold Cross Validation Standard Deviation 0.0540

VLAD Bag of SIFT & Linear SVM (RBF Kernel)

Accuracy is 0.598
10-fold Average Cross Validation Accuracy: 0.3840
10-fold Cross Validation Standard Deviation 0.0334

Fisher Vector Bag of SIFT & Linear Classification SVM (RBF Kernel)


Accuracy (mean of diagonal of confusion matrix) is 0.641
10-fold Average Cross Validation Accuracy: 0.3960
10-fold Cross Validation Standard Deviation 0.0659

Scene classification results visualization

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.460
Bedroom

Bedroom

LivingRoom

Office
Store 0.470
Kitchen

Industrial

Bedroom

Industrial
Bedroom 0.440
LivingRoom

LivingRoom

Kitchen

Office
LivingRoom 0.350
Store

Office

Kitchen

Industrial
Office 0.970
Suburb

Kitchen

LivingRoom

InsideCity
Industrial 0.320
Store

LivingRoom

TallBuilding

Bedroom
Suburb 0.900
Street

InsideCity

Store

TallBuilding
InsideCity 0.750
TallBuilding

Street

TallBuilding

Kitchen
TallBuilding 0.650
InsideCity

InsideCity

Mountain

Street
Street 0.590
Mountain

Industrial

Highway

Highway
Highway 0.800
Industrial

OpenCountry

Mountain

Coast
OpenCountry 0.410
Coast

Industrial

Highway

Coast
Coast 0.820
InsideCity

Industrial

OpenCountry

InsideCity
Mountain 0.770
TallBuilding

Store

Office

Coast
Forest 0.910
OpenCountry

OpenCountry

Mountain

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label