Project 4 / Scene Recognition with Bag of Words

For this project we explored different uses of bag-of-words model for scene recognition. For this project we used 2 types of features and 2 types of classifiers. For information about the prompt see here

Features

  1. Tiny Images: we actually just resize the image and take the pixels as the features.
  2. Bag of Sift features:
    1. Given an image to classify we gather a large number of SIFT features.
    2. We compute the single nearest neighbor from each SIFT feature to the nearest word in our vocabulary.
    3. We count the number of hits we have for each word in our vocabulary.
    4. Our feature is a normalized histogram of these counts.

Classifiers

  1. Nearest neighbor: We compute all distances from each test-image feature to each train-image feature and assign the same category from the train-image to the test-image.
  2. Linear SVMs: We train a linear 1-vs-all SVM for each label/category and use confidences to pick the "best" matching category for an image.

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.616

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.520
Office

Bedroom

InsideCity

Bedroom
Store 0.390
Industrial

LivingRoom

InsideCity

LivingRoom
Bedroom 0.390
LivingRoom

LivingRoom

LivingRoom

Kitchen
LivingRoom 0.270
Bedroom

Street

Industrial

Office
Office 0.950
LivingRoom

LivingRoom

Kitchen

Kitchen
Industrial 0.320
InsideCity

LivingRoom

Coast

LivingRoom
Suburb 0.950
Highway

OpenCountry

TallBuilding

InsideCity
InsideCity 0.620
Street

Street

Coast

TallBuilding
TallBuilding 0.750
InsideCity

InsideCity

OpenCountry

InsideCity
Street 0.460
LivingRoom

Forest

InsideCity

Highway
Highway 0.710
Street

Store

Coast

Mountain
OpenCountry 0.280
Coast

Highway

Forest

Coast
Coast 0.830
Industrial

OpenCountry

LivingRoom

OpenCountry
Mountain 0.850
Street

Industrial

Suburb

LivingRoom
Forest 0.950
InsideCity

Industrial

Mountain

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Results

Overall our resulting accuracies are reported in the table below. However as you can see above our accuracies varied from category to category.

Results in a table

Features \ Classifiers Nearest Neighbor Liner SVMs
Tiny Images 16.1% N/A
Bag of Sift 48.5 ~50% 61.6% ~60%