Project 4 / Scene Recognition with Bag of Words

In this project, we explore the capabilities of scene recognition using Tiny Images and Bag of Words with KNN and SVM classifiers.

Tiny Images with K-Nearest Neighbors

Tiny Images features are obtained by converting the image to a zero-mean, normalized 16x16 image. With those features, we perform a K-Nearest Neighbors classification by finding the k nearest training features via Euclidean distance select the most highly voted label from them.

The highest accuracy was 0.203 using 1-Nearest Neighbors. K = 1 will fit the model to the data very closely. The accuracy seemed to get lower as we increased K. This is likely because the Tiny Image features most of the time only very slightly differ from each other, which means more neighbors would increase the chances of an incorrect classification.

In the resulting classification matrix, we find that Tiny Images with 1-Nearest Neighbors performs the best on Highway images and worst on Stores. This makes sense because Highways tend to look the same even when downscaled whereas the details in Stores when downscaled will resemble other images. We find that the classifier tended to return false positives on Coast, which suggests that because Coast is mostly just sky, land, and water, when downsampled and normalized, everything starts to resemble a Coast.

Bag of SIFT with K-Nearest Neighbors

The Bag of SIFT features was constructed by finding the nearest "word" centroid via Euclidean distance for every SIFT feature found in the image and creating a normalized histogram of the number of features associated with each centroid.

The highest accuracy with Bag of SIFT in conjunction with KNN is 0.527 using a step size of 5 and 4-nearest neighbors. Changing K decreased and increased accuracy, but the highest was 4-nearest and second highest was 1-nearest. The first most similar histogram is reliable but a better model can be obtained reaching a sweetspot value of K.

Bag of SIFT with KNN performed the best on Forests, Highways, Suburbs and Offices. Similar to Tiny Images, the good performance on Highways can be attributed to its lack of detailed features across images. The success on Forests is likely due to forest features being distinctly different from features of other images. We see, though, that most of the difficulty lies in distinguishing between Kitchen, Store, Bedroom, and Livingroom which share similar features with one another.

Bag of SIFT with Linear SVM

The Linear SVM binary classifies by splitting data with a hyperplane. For this implementation, we created 15 1-vs-all binary classifiers, one for each of our classes. Each SVM classifier finds the optimal S and B in the hyperplane represented by S*X + B. We then apply the S and B on the test features to get confidence values.

The highest accuracy with Bag of SIFT using Linear SVMs was 0.639 using a step size of 5 and a lambda value of 0.0001. As we decrease lambda, the accuracy increases.

Scene classification results visualization

Accuracy (mean of diagonal of confusion matrix) is 0.634

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Kitchen 0.520
Office
LivingRoom
Bedroom
LivingRoom

Store 0.460
LivingRoom
Coast
Forest
LivingRoom

Bedroom 0.390
Kitchen
LivingRoom
LivingRoom
Industrial

LivingRoom 0.380
Store
Store
Mountain
TallBuilding

Office 0.870
Kitchen
LivingRoom
Bedroom
Bedroom

Industrial 0.450
TallBuilding
Store
Kitchen
InsideCity

Suburb 0.910
Forest
Coast
Store
Industrial

InsideCity 0.480
Highway
TallBuilding
Industrial
OpenCountry

TallBuilding 0.650
Street
Industrial
InsideCity
InsideCity

Street 0.660
InsideCity
InsideCity
TallBuilding
Highway

Highway 0.820
OpenCountry
InsideCity
Coast
OpenCountry

OpenCountry 0.570
Suburb
Highway
Industrial
Mountain

Coast 0.700
OpenCountry
Bedroom
OpenCountry
InsideCity

Mountain 0.790
OpenCountry
Forest
Forest
Forest

Forest 0.860
Store
Store
Mountain
Mountain

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label
Kitchen	0.520			Office	LivingRoom	Bedroom	LivingRoom
Store	0.460			LivingRoom	Coast	Forest	LivingRoom
Bedroom	0.390			Kitchen	LivingRoom	LivingRoom	Industrial
LivingRoom	0.380			Store	Store	Mountain	TallBuilding
Office	0.870			Kitchen	LivingRoom	Bedroom	Bedroom
Industrial	0.450			TallBuilding	Store	Kitchen	InsideCity
Suburb	0.910			Forest	Coast	Store	Industrial
InsideCity	0.480			Highway	TallBuilding	Industrial	OpenCountry
TallBuilding	0.650			Street	Industrial	InsideCity	InsideCity
Street	0.660			InsideCity	InsideCity	TallBuilding	Highway
Highway	0.820			OpenCountry	InsideCity	Coast	OpenCountry
OpenCountry	0.570			Suburb	Highway	Industrial	Mountain
Coast	0.700			OpenCountry	Bedroom	OpenCountry	InsideCity
Mountain	0.790			OpenCountry	Forest	Forest	Forest
Forest	0.860			Store	Store	Mountain	Mountain
Category name	Accuracy	Sample training images	Sample true positives	False positives with true label		False negatives with wrong predicted label

Kenny Shu

Project 4 / Scene Recognition with Bag of Words

Tiny Images with K-Nearest Neighbors

Bag of SIFT with K-Nearest Neighbors

Bag of SIFT with Linear SVM

Scene classification results visualization