The aim of this project was to do scene recognition using feature description methods like tiny images and bags of features, and classification methods like Nearest neighbours and Support vector machines.
The modules applied here were
Below are the results with different combinations
Methodology | Accuracy |
---|---|
Tiny Images + Nearest Neighbor | 0.200 |
bag of SIFT + nearest neighbor | 0.457 | bag of SIFT + 1 vs all linear SVM | 0.579 |
Extra Credit - Tiny Images (with features at different scales) + Nearest Neighbor | 0.204 |
Extra Credit - Tiny Images (with features at different scales) + Naive Bayes Nearest Neighbor | 0.217 |
Extra Credit - Tiny Images + Naive Bayes Nearest Neighbor | 0.213 |
Extra Credit - bag of SIFT + Naive Bayes Nearest Neighbor | 0.497 |
For extra Credit, I basically did 4 tasks.
The most significant of these was Naive Bayes Nearest Neighbor. Each improvement experiment is explained in breif below
This methodology was proposed by Boiman, Schechtman, and Irani in CVPR in 2008. The is different from Nearest Neighbor because for each image, we look at its difference from nearest neighbors for each class, and use a score to represent this difference. We then assign the image to the class whose score is the least. I saw a significant improvement (around 4%) over the traditional 1NN approach on using this method, as shown in the results.
I tried using different vocabulary sizes with the bag of SIFT + SVM methodology. The performance and exuction time of the program (including vocabulary generation) is reported below.
Vocabulary Size | Accuracy | Execution Time (In Seconds) |
---|---|---|
10 | 0.397 | 95 |
20 | 0.431 | 119 |
50 | 0.549 | 169 |
100 | 0.569 | 275 |
200 | 0.571 | 432 |
500 | 0.507 | 993 |
I tested cross validation by dividing input into sets of 100 training images. I found that my image degraded using this, specially for Nearest neighbor, because there are fewer images in the training set that a test image can now be compared against for this 1-NN distance. So, I havent included it in the main file, but have submitted another file, proj4_withCrossValidation.m, with this change.
I tried reducing images with gaussians at different levels for the Tiny images feature extraction. I reduced image twice using impyramid function, and concatenated vectors formed from tiny images for all these images (original and reduced). I noticed that there was a sligh improvement on using this for tiny images.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.540 | Bedroom |
Store |
Office |
Office |
||||
Store | 0.540 | Kitchen |
LivingRoom |
Kitchen |
Mountain |
||||
Bedroom | 0.200 | Kitchen |
LivingRoom |
Store |
Office |
||||
LivingRoom | 0.210 | Bedroom |
Industrial |
Store |
Kitchen |
||||
Office | 0.870 | LivingRoom |
LivingRoom |
LivingRoom |
Kitchen |
||||
Industrial | 0.410 | Street |
OpenCountry |
Store |
TallBuilding |
||||
Suburb | 0.830 | OpenCountry |
InsideCity |
OpenCountry |
OpenCountry |
||||
InsideCity | 0.490 | LivingRoom |
Street |
TallBuilding |
Street |
||||
TallBuilding | 0.670 | Street |
Highway |
Mountain |
Industrial |
||||
Street | 0.560 | Mountain |
TallBuilding |
Industrial |
Industrial |
||||
Highway | 0.790 | Mountain |
Industrial |
Coast |
TallBuilding |
||||
OpenCountry | 0.470 | Mountain |
Forest |
Forest |
Suburb |
||||
Coast | 0.700 | Suburb |
OpenCountry |
OpenCountry |
OpenCountry |
||||
Mountain | 0.580 | TallBuilding |
OpenCountry |
Forest |
Street |
||||
Forest | 0.820 | TallBuilding |
Mountain |
Industrial |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |
As shown here, bag of SIFT + 1 vs all linear SVM gives the best results in my experiement. I also learnt that Naive Bayes Nearest Neighbours, though simple to apply, shows good improvement on the traditional 1-NN algorithm.