Project 4 / Scene Recognition with Bag of Words

Required Pipelines

Tiny Images + NN

Each image is represented by one "tiny image." Tiny Images are created by scaling the image down to 16x16 (ignoring aspect ratio). Each 16x16 tiny image is then vectorized and normalized.

SIFT + NN

The SIFT vocabulary is built by detecting SIFT features on the training set with a step size of 100 and clustering them into 200 centers.

Image representations are then built by detecting SIFT features with a step size of 10. A histogram is then built by assigning each of the detected features to one of the 200 vocabulary centers. These histogram counts are normalized to form the final feature representation.

SIFT + SVM

An SVM is built and trained for each of the 15 categories of images in the data set. Every SVM uses lambda = 0.0001. Every test image is passed through each category's SVM and the SVM which returns the most confident answer is the category which is predicted.

Results

Pipeline Accuracy
Tiny Images + NN 22.6%
SIFT + NN 42.7%
SIFT + SVM 56.7%

Best Results

These results were obtained using the SIFT + SVM pipeline.


Accuracy (mean of diagonal of confusion matrix) is 0.567

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.470
LivingRoom

LivingRoom

Bedroom

Mountain
Store 0.430
InsideCity

InsideCity

Mountain

Kitchen
Bedroom 0.330
LivingRoom

LivingRoom

Kitchen

Office
LivingRoom 0.290
Kitchen

Bedroom

Kitchen

Bedroom
Office 0.830
Kitchen

LivingRoom

Highway

Kitchen
Industrial 0.380
Street

Street

Highway

Coast
Suburb 0.840
OpenCountry

OpenCountry

Mountain

Industrial
InsideCity 0.460
TallBuilding

TallBuilding

Store

Kitchen
TallBuilding 0.530
Coast

InsideCity

Store

Bedroom
Street 0.540
Industrial

TallBuilding

InsideCity

InsideCity
Highway 0.740
Store

Coast

Coast

Mountain
OpenCountry 0.450
Coast

Mountain

TallBuilding

Highway
Coast 0.700
Suburb

Highway

OpenCountry

LivingRoom
Mountain 0.630
Street

OpenCountry

Highway

InsideCity
Forest 0.880
OpenCountry

Mountain

Mountain

TallBuilding
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label