Project 4 / Scene Recognition with Bag of Words

Confusion matrix of bag of sift features + NN classifier

For this project we were required to implement scene recognition in a number of different ways. We started with a very simple and easy to comprehend tiny image feature paired with a nearest neighbor classifier. From there we moved onto implementing a bag of SIFT feature representation and used the same nearest neighbor classifier. Lastly, we were tasked with improving on the nearest neighbor classifier by using a more advanced linear SVN classifier.

Best results achieved

I'll start with my best case results I got when testing. Most of these were the result of some strategic or sometimes lucky parameter tweaking. Unfortunately the bag of sift + NN took a huge chunk of time compared to my fastest.

  1. Chance performance - 7.1%
  2. Tiny Images and Nearest Neighbor classifier performance - 22.4%
  3. Bag of Sift and Nearest Neighbor classifier performance - 47.2%
  4. Bag of Sift and linear SVM classifier performance - 0%

The parameters used for the highest percentage are the ones in the submitted source code with the exception of bag of sift and nearest neighbor classifier, which is where my luck came in. I tried different combinations of vl_dsift parameters (mainly adjsuting the size and step) while not using the 'fast' parameter and had accepted that I wasn't going to get better than 43% accuracy. After that I set out to make it faster for submission (I was getting 20+ minute runtimes) by adding the 'fast' parameter to my vl_dsift calls and shockingly decreased runtimes dramatically and increased accuracy to 44.7%. I ran it again in hopes it wasn't a fluke and got the same result. I then removed 'fast' from the build_vocabulary vl_dsift call and increased my accuracy to 47.2% after rebuilding the vocab.mat file. This was the highest accuracy I received with this method and the submitted code only differs in that 'fast' has been re-added to build_vocabulary.

Unfortunately I was not able to tweak parameters as much as I had wanted to due to my own procrastination as well as the long runtime of each iteration and as such, my accuracy is not as maximized as I would've liked.

Tiny Image features

  1. Resize each image to a small fixed resolution (16x16)
  2. Make the image have zero mean and unit length

Nearest neighbor classifier

  1. Calculate the distance between the set of features from the test images and and the training fature.
  2. Iterate over each test image feature and find the minimum distance from the distances calculated above.
  3. Label the test image with the category from the lowest distance feature.

Bag of Sift features

The first step of creating these features is to build the vocabulary from the training set.

  1. Iterate over each training image and compute sift features.
  2. Cluster the large set of sift features using kmeans and return the resulting centers.

After building the vocabulary we can move to creating bags of sift features. I opted to use vl_feat's kdtree mechanics which helped raise my speeds to a reasonable level.

  1. Build a kdtree from the vocabulary built above.
  2. Invert the vocabulary because of rows and columns switch.
  3. Iterate over each training image and compute sift features.
  4. Query the kdtree to find the closest cluster center from the vocabulary to the sift features.
  5. Build and normalize a histogram showing the popularity of each cluster.

Linear SVM classifier

I was not able to get any proper results for my attempts at using the svm classifier. I'm not sure where I was going wrong but I think part of it was switching of rows and columns between matlab and vl_feat. It was driving me insane trying to switch those and keep track of which needs to be inverted and in the end that combined with 5+ minute runtime just got the best of me...

Bag of Sift + Nearest Neighbor classifier results


Accuracy (mean of diagonal of confusion matrix) is 0.447

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.410
Store

Industrial

Store

LivingRoom
Store 0.320
TallBuilding

Bedroom

LivingRoom

Kitchen
Bedroom 0.220
LivingRoom

TallBuilding

LivingRoom

TallBuilding
LivingRoom 0.290
TallBuilding

Store

Bedroom

Office
Office 0.580
Bedroom

Kitchen

Kitchen

Kitchen
Industrial 0.160
Kitchen

InsideCity

Suburb

LivingRoom
Suburb 0.690
LivingRoom

Street

Store

TallBuilding
InsideCity 0.320
Suburb

Coast

Bedroom

Store
TallBuilding 0.400
Mountain

Street

Street

InsideCity
Street 0.470
Industrial

LivingRoom

InsideCity

Office
Highway 0.610
Mountain

Coast

TallBuilding

Coast
OpenCountry 0.470
Mountain

Coast

Suburb

Mountain
Coast 0.480
Mountain

OpenCountry

Bedroom

Highway
Mountain 0.490
Industrial

Highway

TallBuilding

Store
Forest 0.790
OpenCountry

Store

LivingRoom

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label