This project aims at implementing scene recognition techniques using tiny images, nearest neighbor classification, bags of quantized local features and linear classifiers learned by support vector machines. Below is the comparison of different combinations and their performances.

Feature	Classifier	Accuracy
Tiny Image	Nearest Neighbor	21.3%
Bag of SIFT	Nearest Neighbor	52.1%
Bag of SIFT	SVM	68.1%
Bag of SIFT & GIST	Nearest Neighbor	60.7%
Bag of SIFT & GIST	SVM	76.7%

Confusion Matrix

The scene classification results visualization is attached as below. The accuracy (mean of diagonal of confusion matrix) reaches 0.767.

Scene classification results visualization

Accuracy (mean of diagonal of confusion matrix) is 0.767

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Kitchen 0.670
Bedroom
LivingRoom
Store
Bedroom

Store 0.660
InsideCity
InsideCity
TallBuilding
TallBuilding

Bedroom 0.650
Kitchen
Kitchen
Kitchen
LivingRoom

LivingRoom 0.510
Street
Bedroom
Bedroom
Bedroom

Office 0.960
LivingRoom
TallBuilding
Kitchen
Kitchen

Industrial 0.680
Store
OpenCountry
OpenCountry
Mountain

Suburb 0.980
OpenCountry
Industrial
LivingRoom
Industrial

InsideCity 0.750
Store
Street
Office
Store

TallBuilding 0.820
Industrial
Store
Coast
InsideCity

Street 0.800
Highway
InsideCity
Highway
Industrial

Highway 0.850
Coast
Coast
Street
OpenCountry

OpenCountry 0.590
Coast
Bedroom
Coast
Coast

Coast 0.820
Mountain
OpenCountry
OpenCountry
OpenCountry

Mountain 0.870
Forest
OpenCountry
OpenCountry
OpenCountry

Forest 0.900
Highway
TallBuilding
Mountain
Mountain

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label	False negatives with wrong predicted label
Kitchen	0.670					Bedroom	LivingRoom	Store	Bedroom
Store	0.660					InsideCity	InsideCity	TallBuilding	TallBuilding
Bedroom	0.650					Kitchen	Kitchen	Kitchen	LivingRoom
LivingRoom	0.510					Street	Bedroom	Bedroom	Bedroom
Office	0.960					LivingRoom	TallBuilding	Kitchen	Kitchen
Industrial	0.680					Store	OpenCountry	OpenCountry	Mountain
Suburb	0.980					OpenCountry	Industrial	LivingRoom	Industrial
InsideCity	0.750					Store	Street	Office	Store
TallBuilding	0.820					Industrial	Store	Coast	InsideCity
Street	0.800					Highway	InsideCity	Highway	Industrial
Highway	0.850					Coast	Coast	Street	OpenCountry
OpenCountry	0.590					Coast	Bedroom	Coast	Coast
Coast	0.820					Mountain	OpenCountry	OpenCountry	OpenCountry
Mountain	0.870					Forest	OpenCountry	OpenCountry	OpenCountry
Forest	0.900					Highway	TallBuilding	Mountain	Mountain
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label	False negatives with wrong predicted label

The result shows that SVM performs better than NN as it increases accuracy rates from 52.1% to 68.1% when using bag of sift; 60.7% to 76.7% when using bag of sift plus gist. Meanwhile, use of gist increases accuracy by 8.6% when using NN and SVM. There might still be room for improvement if gist and sift descriptors are combined in a smarter way.

Li Yi

Project 4 / Scene Recognition with Bag of Words

Confusion Matrix

Scene classification results visualization