In this project, we explore the capabilities of scene recognition using Tiny Images and Bag of Words with KNN and SVM classifiers.
Tiny Images features are obtained by converting the image to a zero-mean, normalized 16x16 image. With those features, we perform a K-Nearest Neighbors classification by finding the k nearest training features via Euclidean distance select the most highly voted label from them.
The highest accuracy was 0.203 using 1-Nearest Neighbors. K = 1 will fit the model to the data very closely. The accuracy seemed to get lower as we increased K. This is likely because the Tiny Image features most of the time only very slightly differ from each other, which means more neighbors would increase the chances of an incorrect classification.
In the resulting classification matrix, we find that Tiny Images with 1-Nearest Neighbors performs the best on Highway images and worst on Stores. This makes sense because Highways tend to look the same even when downscaled whereas the details in Stores when downscaled will resemble other images. We find that the classifier tended to return false positives on Coast, which suggests that because Coast is mostly just sky, land, and water, when downsampled and normalized, everything starts to resemble a Coast.
The Bag of SIFT features was constructed by finding the nearest "word" centroid via Euclidean distance for every SIFT feature found in the image and creating a normalized histogram of the number of features associated with each centroid.
The highest accuracy with Bag of SIFT in conjunction with KNN is 0.527 using a step size of 5 and 4-nearest neighbors. Changing K decreased and increased accuracy, but the highest was 4-nearest and second highest was 1-nearest. The first most similar histogram is reliable but a better model can be obtained reaching a sweetspot value of K.
Bag of SIFT with KNN performed the best on Forests, Highways, Suburbs and Offices. Similar to Tiny Images, the good performance on Highways can be attributed to its lack of detailed features across images. The success on Forests is likely due to forest features being distinctly different from features of other images. We see, though, that most of the difficulty lies in distinguishing between Kitchen, Store, Bedroom, and Livingroom which share similar features with one another.
The Linear SVM binary classifies by splitting data with a hyperplane. For this implementation, we created 15 1-vs-all binary classifiers, one for each of our classes. Each SVM classifier finds the optimal S and B in the hyperplane represented by S*X + B. We then apply the S and B on the test features to get confidence values.
The highest accuracy with Bag of SIFT using Linear SVMs was 0.639 using a step size of 5 and a lambda value of 0.0001. As we decrease lambda, the accuracy increases.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.520 | Office |
LivingRoom |
Bedroom |
LivingRoom |
||||
Store | 0.460 | LivingRoom |
Coast |
Forest |
LivingRoom |
||||
Bedroom | 0.390 | Kitchen |
LivingRoom |
LivingRoom |
Industrial |
||||
LivingRoom | 0.380 | Store |
Store |
Mountain |
TallBuilding |
||||
Office | 0.870 | Kitchen |
LivingRoom |
Bedroom |
Bedroom |
||||
Industrial | 0.450 | TallBuilding |
Store |
Kitchen |
InsideCity |
||||
Suburb | 0.910 | Forest |
Coast |
Store |
Industrial |
||||
InsideCity | 0.480 | Highway |
TallBuilding |
Industrial |
OpenCountry |
||||
TallBuilding | 0.650 | Street |
Industrial |
InsideCity |
InsideCity |
||||
Street | 0.660 | InsideCity |
InsideCity |
TallBuilding |
Highway |
||||
Highway | 0.820 | OpenCountry |
InsideCity |
Coast |
OpenCountry |
||||
OpenCountry | 0.570 | Suburb |
Highway |
Industrial |
Mountain |
||||
Coast | 0.700 | OpenCountry |
Bedroom |
OpenCountry |
InsideCity |
||||
Mountain | 0.790 | OpenCountry |
Forest |
Forest |
Forest |
||||
Forest | 0.860 | Store |
Store |
Mountain |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |