Project 4 / Scene Recognition with Bag of Words

Fig 1. Confusion Matrix generated by SIFT and GIST descriptors. Best result I gain in this project.

Basic Implementation

In this project, we perform scenen recognition for a 15 scene database. We try to use tiny images representation and bag of SIFT representation for feature extraction. And we test the effectness both of KNN and SVM in this project for scene classification. After implementing those algorithms individually, we need to test their performance with three different combinations. Below shows the results we gain before applying advanced steps in extra part. (vocabulary size: 200, K in KNN: 5, LAMBDA: 0.0001)

  1. Tiny images representation and nearest neighbor classifier (accuracy is 22.3%) Confusion matrix and the table of classifier results
  2. Bag of SIFT representation and nearest neighbor classifier (accuracy is 52.1%) Confusion matrix and the table of classifier results
  3. Bag of SIFT representation and linear SVM classifier (accuracy is 63.9%) Confusion matrix and the table of classifier results

Extra Credit / Graduate Credit

1. Add additional, complementary features - Add gist descriptors

For this task, I generate gist descriptor of each image and append it to the normalized sift descriptor. All the parameters setting keep the same.The gist feature descriptor refers to http://people.csail.mit.edu/torralba/code/spatialenvelope/ . Adding the gist features can boost up the classification accuracy to 70.8%, which is the best performance I gain. So i show the confusion matrix in Fig 1. To test the combination of sift and gist feature, you need to change the set up to be(The table of classifier results will be shown in the latter part of this report)

%code for this part is in get_bags_of_gists
FEATURE = 'sift and gist';

2. Experiment with many different vocabulary sizes and report performance

For this task, I do experiment on vocabulary sizes of 10, 20, 50, 100, 200, 400. And I use the sift descriptors and all-SVM for classification. The corresponding accuracy values are shown in Fig 2. We find that the accuracy tends to converge after the size of vocabulary become larger. Since lager vocal_size will lead to longer processing time, it is important to select an appropriate vocal_size considering the tradeoff bewteen accuracy and effeciency.

Fig 2. Accuracy gained by bag of SIFT + 1 vs all linear SVM with different vocabulary sizes.

3. Train the SVM with chi-sqr

For this task, I train the SVM using chi-sqr kernel. Following the instruction provided in http://www.vlfeat.org/overview/svm.html, I first construct a chi-sqr dataset based for the training data. Then I feed the new dataset to vl_svmtrain. The code is provided below

Confusion matrix and the table of classifier results

%training for the SVM with chi-sqr
hom.kernel = 'KChi2';
hom.order = 2;
dataset = vl_svmdataset(train_image_feats', 'homkermap', hom);
[W, B] = vl_svmtrain(dataset, train_label', LAMBDA);

%calculate score when the jth item in ith category
score=(ws{i}')*(vl_homkermap(test_image_feats(j,:)', 2))+bs{i};

Fig 3. Confusion Matrix gained with bag of sift and SVM trained with chi-sqr.

Best Result Showcase

To get the best result, I get the GIST descriptors and use them together with the normalized SIFT descriptor as image features. The code for generating this mix features is in get_sift_and_gist.m. I use linear SVM for it. To reproduce it. And you should set parameters to be the following values in seperate file.

%In proj4.m
FEATURE = 'sift and gist';
CLASSIFIER = 'support vector machine';
vocab_size = 200;

%In svm_classify.m
LAMBDA=0.0001;

Results visualization for my best performing recognition pipeline.


Accuracy (mean of diagonal of confusion matrix) is 0.708

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.560
LivingRoom

LivingRoom

LivingRoom

Industrial
Store 0.610
Bedroom

Forest

Industrial

Kitchen
Bedroom 0.480
LivingRoom

Kitchen

Store

LivingRoom
LivingRoom 0.490
Bedroom

Bedroom

Industrial

Street
Office 0.890
Bedroom

TallBuilding

LivingRoom

Bedroom
Industrial 0.690
TallBuilding

Bedroom

Suburb

Street
Suburb 0.940
Store

Mountain

InsideCity

Industrial
InsideCity 0.570
LivingRoom

Highway

Store

Street
TallBuilding 0.780
Industrial

Industrial

LivingRoom

LivingRoom
Street 0.660
Highway

Mountain

Industrial

Store
Highway 0.800
OpenCountry

Street

Coast

Mountain
OpenCountry 0.620
Mountain

Industrial

Coast

Highway
Coast 0.800
OpenCountry

Highway

OpenCountry

Mountain
Mountain 0.820
Kitchen

Coast

OpenCountry

Highway
Forest 0.910
Store

OpenCountry

Mountain

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label