Project 4 / Scene Recognition with Bag of Words

The goal of this project is to perform scene recognition with 3 different methods. They are:

  1. Tiny image representation and nearest neighbor classifier(20.133%)
  2. Bags of SIFT representation and nearest neighbor classifier(50.400%)
  3. Bags of SIFT representation and linear SVM classifier(69.400%)
The image shown at the right is the confusion matrix for bag of SIFT representation+linear SVM classifier with vocabulary size=1000, LAMBDA=0.0001

Tiny image representation and nearest neighbor classifier

Tiny image representation simply resize each image to a small fixed, resolution which is 16*16 in my implementation. This representation discards high frequency and is not invariant to spatial or brightness shifts.
Nearest neighbor classifier finds the nearest training example(K=1), then assigns the test case the label of that nearest neighbor. But this classifier is vulnerable to training noise and is likely to cause overfitting. So I use K nearest neighbor to alleviate the effect of training noise. KNN finds the nearest K training examples, and assigns the test case the label which most K neighbors agree on.
 
The performances with different K neighbors is shown as below, the highest accuracy is 20.133% with k=1.
K=1 k=4 k=7 k=9
20.133% 18.933% 18.607% 18.333%
 
 

Bags of SIFT representation and nearest neighbor classifier

Bag of words models ignore spatial information and builds the histogram based on the frequency of visual words. I build the vocabulary by clustering a large corpus of local features with kmeans. To build the histogram, I count how many SIFT features fall into each cluster in my vocabulary. Then I normalized the histogram so that the image size won't significantly change the bag of feature magnitude.
 
I set vocabulary size to 1000, the performance with different K is shown as below, the highest accuracy is 50.400% with k=9
K=1 k=4 k=7 k=9
49.067% 48.267% 50.000% 50.400%
 
 

Bags of SIFT representation and linear SVM classifier

To decide which category a test case belongs to, I first trained 15 binary,1-vs-all SVMs. Then I evaluated each test case with 15 classifiers, and the classifier which has the largest score wins. The score is calculated by W*X + B, where W and B are the learned hyperplane parameters.When learning an SVM, I used the function vl_svmtrain(features, labels, LAMBDA) which has a free parameter LAMBDA. Different values of LAMBDA can have a great effect on the accuracy.
 
I set vocabulary size to 1000, the performance with different LAMBDA is shown as below, the highest accuracy is 69.400% with lambda=0.0001
LAMBDA=0.000001 LAMBDA=0.00001 LAMBDA=0.0001 LAMBDA=0.001
66.867% 66.933% 69.400% 62.667%
 
LAMBDA=0.01 LAMBDA=0.1 LAMBDA=1 LAMBDA=10
50.467% 41.933% 37.333% 42.200%
 

Extra credit

For extra credit, I experimented with different vocabulary size. The performance with different vocabulary size is shown as below(bag of SIFT+linear SVM,lambda=0.0001):
vocab=10 vocab=20 vocab=50 vocab=100 vocab=200 vocab=400 vocab=1000 vocab=2000
45.533% 55.467% 61.133% 64.467% 66.200% 67.267% 69.400% 68.600%
 
 
 

Results for bag of SIFT representation and linear SVM classifier with vocabulary size=1000, LAMBDA=0.0001

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 69.400%

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.520
Store

Industrial

LivingRoom

Office
Store 0.570
Bedroom

Kitchen

Bedroom

Industrial
Bedroom 0.530
Kitchen

Kitchen

Industrial

Office
LivingRoom 0.400
Kitchen

Bedroom

Bedroom

Kitchen
Office 0.890
LivingRoom

Industrial

LivingRoom

LivingRoom
Industrial 0.610
Store

Highway

TallBuilding

Bedroom
Suburb 0.960
OpenCountry

InsideCity

Street

InsideCity
InsideCity 0.590
Street

Store

Industrial

Bedroom
TallBuilding 0.800
Kitchen

Industrial

Office

Bedroom
Street 0.680
Store

Office

InsideCity

Industrial
Highway 0.820
Mountain

Coast

Coast

Coast
OpenCountry 0.550
Coast

Mountain

Coast

Highway
Coast 0.780
Highway

Highway

OpenCountry

OpenCountry
Mountain 0.780
Store

Forest

Coast

OpenCountry
Forest 0.930
OpenCountry

Mountain

Mountain

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label