Project 4 / Scene Recognition with Bag of Words

This project aims at implementing scene recognition techniques using tiny images, nearest neighbor classification, bags of quantized local features and linear classifiers learned by support vector machines. Below is the comparison of different combinations and their performances.

Feature Classifier Accuracy
Tiny Image Nearest Neighbor 21.3%
Bag of SIFT Nearest Neighbor 52.1%
Bag of SIFT SVM 68.1%
Bag of SIFT & GIST Nearest Neighbor 60.7%
Bag of SIFT & GIST SVM 76.7%

Confusion Matrix

The scene classification results visualization is attached as below. The accuracy (mean of diagonal of confusion matrix) reaches 0.767.

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.767

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.670
Bedroom

LivingRoom

Store

Bedroom
Store 0.660
InsideCity

InsideCity

TallBuilding

TallBuilding
Bedroom 0.650
Kitchen

Kitchen

Kitchen

LivingRoom
LivingRoom 0.510
Street

Bedroom

Bedroom

Bedroom
Office 0.960
LivingRoom

TallBuilding

Kitchen

Kitchen
Industrial 0.680
Store

OpenCountry

OpenCountry

Mountain
Suburb 0.980
OpenCountry

Industrial

LivingRoom

Industrial
InsideCity 0.750
Store

Street

Office

Store
TallBuilding 0.820
Industrial

Store

Coast

InsideCity
Street 0.800
Highway

InsideCity

Highway

Industrial
Highway 0.850
Coast

Coast

Street

OpenCountry
OpenCountry 0.590
Coast

Bedroom

Coast

Coast
Coast 0.820
Mountain

OpenCountry

OpenCountry

OpenCountry
Mountain 0.870
Forest

OpenCountry

OpenCountry

OpenCountry
Forest 0.900
Highway

TallBuilding

Mountain

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

The result shows that SVM performs better than NN as it increases accuracy rates from 52.1% to 68.1% when using bag of sift; 60.7% to 76.7% when using bag of sift plus gist. Meanwhile, use of gist increases accuracy by 8.6% when using NN and SVM. There might still be room for improvement if gist and sift descriptors are combined in a smarter way.