Recognition with Bag of Words

In this project, we perform scenen recognition for a 15 scene database. We try to use tiny images representation and bag of SIFT representation for feature extraction. And we test the effectness both of KNN and SVM in this project for scene classification. After implementing those algorithms individually, we need to test their performance with three different combinations. Below shows the results we gain before applying advanced steps in extra part. (vocabulary size: 200, K in KNN: 5, LAMBDA: 0.0001)

1. Add additional, complementary features - Add gist descriptors

For this task, I generate gist descriptor of each image and append it to the normalized sift descriptor. All the parameters setting keep the same.The gist feature descriptor refers to http://people.csail.mit.edu/torralba/code/spatialenvelope/ . Adding the gist features can boost up the classification accuracy to 70.8%, which is the best performance I gain. So i show the confusion matrix in Fig 1. To test the combination of sift and gist feature, you need to change the set up to be(The table of classifier results will be shown in the latter part of this report)

%code for this part is in get_bags_of_gists
FEATURE = 'sift and gist';

2. Experiment with many different vocabulary sizes and report performance

For this task, I do experiment on vocabulary sizes of 10, 20, 50, 100, 200, 400. And I use the sift descriptors and all-SVM for classification. The corresponding accuracy values are shown in Fig 2. We find that the accuracy tends to converge after the size of vocabulary become larger. Since lager vocal_size will lead to longer processing time, it is important to select an appropriate vocal_size considering the tradeoff bewteen accuracy and effeciency.

Fig 2. Accuracy gained by bag of SIFT + 1 vs all linear SVM with different vocabulary sizes.

3. Train the SVM with chi-sqr

For this task, I train the SVM using chi-sqr kernel. Following the instruction provided in http://www.vlfeat.org/overview/svm.html, I first construct a chi-sqr dataset based for the training data. Then I feed the new dataset to vl_svmtrain. The code is provided below

Confusion matrix and the table of classifier results

%training for the SVM with chi-sqr
hom.kernel = 'KChi2';
hom.order = 2;
dataset = vl_svmdataset(train_image_feats', 'homkermap', hom);
[W, B] = vl_svmtrain(dataset, train_label', LAMBDA);

%calculate score when the jth item in ith category
score=(ws{i}')*(vl_homkermap(test_image_feats(j,:)', 2))+bs{i};

Fig 3. Confusion Matrix gained with bag of sift and SVM trained with chi-sqr.

Best Result Showcase

To get the best result, I get the GIST descriptors and use them together with the normalized SIFT descriptor as image features. The code for generating this mix features is in get_sift_and_gist.m. I use linear SVM for it. To reproduce it. And you should set parameters to be the following values in seperate file.

%In proj4.m
FEATURE = 'sift and gist';
CLASSIFIER = 'support vector machine';
vocab_size = 200;

%In svm_classify.m
LAMBDA=0.0001;

Results visualization for my best performing recognition pipeline.

Accuracy (mean of diagonal of confusion matrix) is 0.708

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Kitchen 0.560
LivingRoom
LivingRoom
LivingRoom
Industrial

Store 0.610
Bedroom
Forest
Industrial
Kitchen

Bedroom 0.480
LivingRoom
Kitchen
Store
LivingRoom

LivingRoom 0.490
Bedroom
Bedroom
Industrial
Street

Office 0.890
Bedroom
TallBuilding
LivingRoom
Bedroom

Industrial 0.690
TallBuilding
Bedroom
Suburb
Street

Suburb 0.940
Store
Mountain
InsideCity
Industrial

InsideCity 0.570
LivingRoom
Highway
Store
Street

TallBuilding 0.780
Industrial
Industrial
LivingRoom
LivingRoom

Street 0.660
Highway
Mountain
Industrial
Store

Highway 0.800
OpenCountry
Street
Coast
Mountain

OpenCountry 0.620
Mountain
Industrial
Coast
Highway

Coast 0.800
OpenCountry
Highway
OpenCountry
Mountain

Mountain 0.820
Kitchen
Coast
OpenCountry
Highway

Forest 0.910
Store
OpenCountry
Mountain
Mountain

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Jianling Wang 903143548

Project 4 / Scene Recognition with Bag of Words

Basic Implementation

Extra Credit / Graduate Credit

1. Add additional, complementary features - Add gist descriptors

2. Experiment with many different vocabulary sizes and report performance

3. Train the SVM with chi-sqr

Best Result Showcase

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label
Kitchen	0.560			LivingRoom	LivingRoom	LivingRoom	Industrial
Store	0.610			Bedroom	Forest	Industrial	Kitchen
Bedroom	0.480			LivingRoom	Kitchen	Store	LivingRoom
LivingRoom	0.490			Bedroom	Bedroom	Industrial	Street
Office	0.890			Bedroom	TallBuilding	LivingRoom	Bedroom
Industrial	0.690			TallBuilding	Bedroom	Suburb	Street
Suburb	0.940			Store	Mountain	InsideCity	Industrial
InsideCity	0.570			LivingRoom	Highway	Store	Street
TallBuilding	0.780			Industrial	Industrial	LivingRoom	LivingRoom
Street	0.660			Highway	Mountain	Industrial	Store
Highway	0.800			OpenCountry	Street	Coast	Mountain
OpenCountry	0.620			Mountain	Industrial	Coast	Highway
Coast	0.800			OpenCountry	Highway	OpenCountry	Mountain
Mountain	0.820			Kitchen	Coast	OpenCountry	Highway
Forest	0.910			Store	OpenCountry	Mountain	Mountain
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label