Project 4 / Scene Recognition with Bag of Words

The goal of this project is to perform scene recognition with several simple methods, then moving on to more advanced methods. We begin with the simplest implmenetation of using tiny images as a representation and nearest neighbor classifier.

Tiny images representation and nearest neighbor classifier

For the tiny images representation I took the center square of each image and resized it to 16x16. I used a nearest neighbor classifier with K=10. The mean diagonal of the confusion matrix was 0.191.

Bag of Sift representation and nearest neighbor classifier

Next I created the Bag of Sift representation to use with the nearest neighbor classifier. To build the vocabulary for the bag of SIFT method, I used the 'fast' method along with 1000 randomly selected features from each image, and a step size of 10. I used a vocabulary size of 50. When creating the bag of sifts for each image I again used a stepsize of 5 and the 'fast' method, along with 2000 randomly selected features. Other parameters were the same as the previous section. Run time (excluding building the vocabulary) was about 2min and the mean diagonal of the confusion matrix was 0.51.

Bag of Sift representation and linear SVM

Using the same parameters for the Bag of Sift representation, I changed the nearest neighbor classifier to a linear SVM. I used the vl_svmtrain() package. I found good performance with a Lambda value of 1.0e-6. Other parameters were the same as the previous section. The mean diagonal of the confusion matrix was 0.583.

EXTRA: Bag of Sift and kernel SVM

Next I used a kernel SVM instead of the linear SVM. I used MATLAB's svmtrain package for this. I found that a polynomial kernel with polynomial order of the vocabulary size worked well. Other parameters were the same as the previous section. The mean diagonal of the confusion matrix was 0.632.

EXTRA: "Soft assignment" of Bag of Sift and kernel SVM

I next implemented the "soft assignment" method for Bag of SIFT found in Kernel codebooks for scene categorization by Gemert et al, ECCV 2008. They outline 4 different kernel-based assignment methods, the best one being "codeword uncertainty" weights, which are defined as taking the gaussian kernel of the distances between the features and the codewords, and normalizing them to sum to 1. Each feature then contributes to every codeword a weighted contribution. I used a sigma value for the gaussian kernel of 3.0e4. Other parameters were the same as the previous section. The mean diagonal of the confusion matrix was 0.635.

EXTRA: Comparison of Vocabulary Sizes

Using the same pipeline with the same parameters found in the previous section, I tested the performance on different vocabulary sizes. The mean diagonal of the confusion matrices are shown in the table below. I did however change the polynomial order for the kernel SVM because it would overfit with high polynomial orders on low vocabulary sizes. I used the same polynomial order as the vocabulary size, except for 400 and 1000, where I used a polynomial kernel of order 200.

Vocab Size Score
10 .445
20 .553
50 .625
100 .669
200 .689
400 .701
1000 .689

Final Result

The full details of the best result I was able to achieve is shown below. I used all the same parameters as the previous section, with a vocabulary size of 400.

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.702

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.690
LivingRoom

Bedroom

Store

Office
Store 0.570
LivingRoom

LivingRoom

Kitchen

Highway
Bedroom 0.450
InsideCity

LivingRoom

TallBuilding

Store
LivingRoom 0.280
Bedroom

Bedroom

Office

Bedroom
Office 0.940
TallBuilding

LivingRoom

Kitchen

Kitchen
Industrial 0.580
TallBuilding

Street

Street

InsideCity
Suburb 0.970
Industrial

OpenCountry

Store

Street
InsideCity 0.610
Store

Street

TallBuilding

TallBuilding
TallBuilding 0.810
LivingRoom

InsideCity

Street

Store
Street 0.670
Coast

TallBuilding

Highway

InsideCity
Highway 0.820
Industrial

Street

Coast

InsideCity
OpenCountry 0.650
Highway

Highway

Mountain

Coast
Coast 0.750
Highway

OpenCountry

OpenCountry

OpenCountry
Mountain 0.830
Forest

Store

Coast

Suburb
Forest 0.910
Mountain

TallBuilding

OpenCountry

Street
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label