Project 4 / Scene Recognition with Bag of Words

The purpose of this project was to use large sample of images and labels and find a way to accurately predict which categorey each image belonged to. Three different setups were used with varying degrees of accuracies and speeds.

Tiny Images and Nearest Neighbor

The first technique simply resized each image down to 16x16 pixels and then used a comparison between new images and these and picked the label corresponding to the closest. There really aren't too many parameters to choose from here other than the size of the images. 16 was the recommended size so this is what was used and an accuracy of around 20 percent was found. While not great this took between 10 - 15 seconds on my desktop so it was very fast. Playing around with larger sizes did not seem to improve accuracy but increased run time.

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.201

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.030
Bedroom

Store

OpenCountry

OpenCountry
Store 0.020
LivingRoom

Kitchen

Highway

TallBuilding
Bedroom 0.090
Kitchen

LivingRoom

Mountain

Mountain
LivingRoom 0.070
Office

Suburb

Coast

OpenCountry
Office 0.110
Forest

Forest

Forest

Mountain
Industrial 0.090
Mountain

Highway

Highway

Coast
Suburb 0.170
OpenCountry

Highway

Highway

OpenCountry
InsideCity 0.030
Kitchen

Mountain

Mountain

Coast
TallBuilding 0.130
Kitchen

Mountain

Forest

Coast
Street 0.430
LivingRoom

Bedroom

Coast

Forest
Highway 0.600
Industrial

Industrial

Coast

Coast
OpenCountry 0.360
Coast

InsideCity

Mountain

Highway
Coast 0.440
Store

TallBuilding

InsideCity

Industrial
Mountain 0.190
Highway

OpenCountry

Office

TallBuilding
Forest 0.260
Store

Mountain

Highway

Office
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Bag of Sifts and Nearest Neighbor

The second technique was to find sift descriptors in the images and then compare which image had the most similar sift descriptors and use that images classification as the predicted classifcation. The parameters that could be changed here are essentially how many sift descriptors are used. In these tests more sift descriptors seemed to help but the best my computer could do was with those generated with a step size of 3 and a size of 3 using VLFeats dsift with the fast parameter. Any lower step size would take overnight and I didn't feel like testing this. As can be seen below this gave an accuracy of almost %51 barely hitting the target. (phew) The quicker run was done with a step size of 10 which gave an accuracy of around %47. This is a lot more accurate than using the tiny images but it does take a long time as the run below took around 20 minutes and the "fast" run still takes around 3-4 minutes.

Additionaly, the vocab was created with a step size of 3. Any lower caused my computer to eat its 16Gb of memory and die.

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.509

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.430
LivingRoom

LivingRoom

Bedroom

Office
Store 0.450
Suburb

InsideCity

LivingRoom

LivingRoom
Bedroom 0.150
Kitchen

Industrial

Kitchen

Office
LivingRoom 0.430
Office

Industrial

Bedroom

Kitchen
Office 0.750
Bedroom

Bedroom

LivingRoom

LivingRoom
Industrial 0.360
TallBuilding

Store

LivingRoom

Kitchen
Suburb 0.910
Street

Coast

OpenCountry

Highway
InsideCity 0.380
Industrial

Kitchen

Store

Industrial
TallBuilding 0.350
OpenCountry

Bedroom

Forest

Street
Street 0.550
Kitchen

Mountain

Store

InsideCity
Highway 0.620
TallBuilding

Coast

Suburb

Suburb
OpenCountry 0.370
Mountain

Mountain

Mountain

Suburb
Coast 0.550
OpenCountry

Street

Highway

Office
Mountain 0.470
OpenCountry

OpenCountry

OpenCountry

OpenCountry
Forest 0.870
TallBuilding

TallBuilding

OpenCountry

OpenCountry
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

Bag of Sifts and SVM

The last and best of the techniques was to use sift descriptors and linear SVM classifiers. The parameters that could be changed here were again the number of sift descriptors but also a lambda value for vl_svmtrain. Again a step size of 3 and a size of 3 was used for the run below and 10 and 3 was used for the "fast" run. Playing around with the fast run a lambda value of 0.000001 gave the best results with both 0.00001 and 0.0000001 giving worse results. If i wanted to spend hours I could have found the perfect value for lambda and gotten above %70 but this hit %69.7 so it was good enough for me. With the "fast" number of sift descriptors and accuracy of around %62 was achieved.

Scene classification results visualization


Accuracy (mean of diagonal of confusion matrix) is 0.697

Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
Kitchen 0.670
LivingRoom

Store

Office

Bedroom
Store 0.520
LivingRoom

LivingRoom

Forest

Kitchen
Bedroom 0.470
Kitchen

LivingRoom

Street

Store
LivingRoom 0.270
Kitchen

Industrial

Office

Bedroom
Office 0.940
LivingRoom

Kitchen

Kitchen

LivingRoom
Industrial 0.620
Coast

Suburb

TallBuilding

LivingRoom
Suburb 0.940
Mountain

InsideCity

Industrial

OpenCountry
InsideCity 0.690
Store

Store

Suburb

Street
TallBuilding 0.740
Industrial

Store

Street

Office
Street 0.670
Bedroom

TallBuilding

Industrial

InsideCity
Highway 0.810
Coast

Industrial

Coast

Coast
OpenCountry 0.550
Highway

Coast

Coast

Street
Coast 0.780
OpenCountry

OpenCountry

Industrial

Highway
Mountain 0.840
OpenCountry

OpenCountry

OpenCountry

Suburb
Forest 0.950
OpenCountry

Street

Mountain

Mountain
Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label