Project 4 / Scene Recognition with Bag of Words

Tiny Images and Nearest Neighbor

Using tiny images and nearest neighbor gave the worst performance out of all of the methods. For my implementation I used images resized to 16x16 with no respect for original aspect ratio. In this manner, image data is not thrown out from cropping. Nearest neighbor simply uses vl_alldist2() and min() to find the neighbor with the smallest euclidian distance. Using these together, I was able to acheive 19.1% accuracy. Below is the confusion matrix for this part.

Scene classification results visualization

Accuracy (mean of diagonal of confusion matrix) is 0.191

Bag Of SIFT and Nearest Neighbor

Using bag of SIFT and nearest neighbor gave quite a bit better performance over tiny images and nearest neighbor, however it took significantly longer. Whereas tiny images and nearest neighbor only took about 30 seconds, but running bag of SIFT and nearest neighbor took about 8 minutes. I increased the vocabulary size to 500 and used a fairly large bin and step size for the SIFT descriptor. Then when getting the 'bags', I used a fairly small bin and step size. This was able to bring my accuracy up to 50.5%. Below is the confusion matrix for this part.

Scene classification results visualization

Accuracy (mean of diagonal of confusion matrix) is 0.505

Bag Of SIFT and Linear SVM

Using bag of SIFT and linear SVM increased accuracy even more and took only slightly longer than Bag of SIFT with nearest neighbor. For my linear SVM, I used a lambda of 0.0001 and found that decreasing lambda didn't really provide much better results, and increasing reduced accuracy pretty drastically. Using linear SVM over nearest neighbor, I was able to get the accuracy up to 67.2%. Below is the confusion matrix for this part, as well as some sample classifications.

Scene classification results visualization

Accuracy (mean of diagonal of confusion matrix) is 0.672

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Kitchen 0.640
LivingRoom
Store
Bedroom
InsideCity

Store 0.560
Industrial
LivingRoom
InsideCity
TallBuilding

Bedroom 0.430
LivingRoom
LivingRoom
Industrial
Store

LivingRoom 0.340
Industrial
Bedroom
Bedroom
Bedroom

Office 0.910
Bedroom
Kitchen
Kitchen
Kitchen

Industrial 0.550
Street
Kitchen
Highway
Suburb

Suburb 0.960
InsideCity
OpenCountry
TallBuilding
Bedroom

InsideCity 0.550
Store
Street
Highway
Suburb

TallBuilding 0.700
InsideCity
Suburb
Industrial
Industrial

Street 0.700
InsideCity
LivingRoom
InsideCity
Highway

Highway 0.810
Street
Industrial
Coast
Mountain

OpenCountry 0.340
Highway
Highway
Mountain
Mountain

Coast 0.810
OpenCountry
OpenCountry
Mountain
InsideCity

Mountain 0.850
Coast
Store
Forest
Forest

Forest 0.930
OpenCountry
Mountain
Mountain
Mountain

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label	False negatives with wrong predicted label
Kitchen	0.640					LivingRoom	Store	Bedroom	InsideCity
Store	0.560					Industrial	LivingRoom	InsideCity	TallBuilding
Bedroom	0.430					LivingRoom	LivingRoom	Industrial	Store
LivingRoom	0.340					Industrial	Bedroom	Bedroom	Bedroom
Office	0.910					Bedroom	Kitchen	Kitchen	Kitchen
Industrial	0.550					Street	Kitchen	Highway	Suburb
Suburb	0.960					InsideCity	OpenCountry	TallBuilding	Bedroom
InsideCity	0.550					Store	Street	Highway	Suburb
TallBuilding	0.700					InsideCity	Suburb	Industrial	Industrial
Street	0.700					InsideCity	LivingRoom	InsideCity	Highway
Highway	0.810					Street	Industrial	Coast	Mountain
OpenCountry	0.340					Highway	Highway	Mountain	Mountain
Coast	0.810					OpenCountry	OpenCountry	Mountain	InsideCity
Mountain	0.850					Coast	Store	Forest	Forest
Forest	0.930					OpenCountry	Mountain	Mountain	Mountain
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label	False negatives with wrong predicted label

From the results above, it looks like this representation and classifier does a pretty good job of separating out scenes that look very different. For example, forest trees were not mistaken for city buildings. However, within a similar category, such as 'interior' (rooms such as the kitchen, bedroom, living room, etc), results were sometimes confused between the two. Out of all of the labels, LivingRoom had the lowest accuracy of 34% and was confused with other interior rooms such as Bedroom and Industrial. OpenCountry also had a low accuracy of 34% and was confused with outdoor scenes such as Highway and Mountain.

Zachary Peterson

Project 4 / Scene Recognition with Bag of Words

Tiny Images and Nearest Neighbor

Scene classification results visualization

Bag Of SIFT and Nearest Neighbor

Scene classification results visualization

Bag Of SIFT and Linear SVM

Scene classification results visualization