Project 4 / Scene Recognition with Bag of Words

The aim of this project was to do scene recognition using feature description methods like tiny images and bags of features, and classification methods like Nearest neighbours and Support vector machines.

The modules applied here were

  1. Tiny Images
  2. Bag of SIFTS
  3. Nearest Neighbor
  4. SVMs

Below are the results with different combinations

Results

Methodology Accuracy
Tiny Images + Nearest Neighbor 0.200
bag of SIFT + nearest neighbor 0.457
bag of SIFT + 1 vs all linear SVM 0.579
Extra Credit - Tiny Images (with features at different scales) + Nearest Neighbor 0.204
Extra Credit - Tiny Images (with features at different scales) + Naive Bayes Nearest Neighbor 0.217
Extra Credit - Tiny Images + Naive Bayes Nearest Neighbor 0.213
Extra Credit - bag of SIFT + Naive Bayes Nearest Neighbor 0.497

Extra Credit Work

For extra Credit, I basically did 4 tasks.

  • Naive Bayes Nearest Neighbor
  • Different Vocabulary Sizes
  • Tested Cross Validation
  • Tiny Images with guassians at different scales
  • The most significant of these was Naive Bayes Nearest Neighbor. Each improvement experiment is explained in breif below

    Naive Bayes Nearest Neighbor

    This methodology was proposed by Boiman, Schechtman, and Irani in CVPR in 2008. The is different from Nearest Neighbor because for each image, we look at its difference from nearest neighbors for each class, and use a score to represent this difference. We then assign the image to the class whose score is the least. I saw a significant improvement (around 4%) over the traditional 1NN approach on using this method, as shown in the results.

    Different Vocabulary Sizes

    I tried using different vocabulary sizes with the bag of SIFT + SVM methodology. The performance and exuction time of the program (including vocabulary generation) is reported below.

    Vocabulary Size Accuracy Execution Time (In Seconds)
    10 0.397 95
    20 0.431 119
    50 0.549 169
    100 0.569 275
    200 0.571 432
    500 0.507 993

    Tested Cross Validation

    I tested cross validation by dividing input into sets of 100 training images. I found that my image degraded using this, specially for Nearest neighbor, because there are fewer images in the training set that a test image can now be compared against for this 1-NN distance. So, I havent included it in the main file, but have submitted another file, proj4_withCrossValidation.m, with this change.

    Tiny Images with guassians at different scales

    I tried reducing images with gaussians at different levels for the Tiny images feature extraction. I reduced image twice using impyramid function, and concatenated vectors formed from tiny images for all these images (original and reduced). I noticed that there was a sligh improvement on using this for tiny images.

    Scene classification results visualization - bag of SIFT + 1 vs all linear SVM


    Accuracy (mean of diagonal of confusion matrix) is 0.579

    Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
    Kitchen 0.540
    Bedroom

    Store

    Office

    Office
    Store 0.540
    Kitchen

    LivingRoom

    Kitchen

    Mountain
    Bedroom 0.200
    Kitchen

    LivingRoom

    Store

    Office
    LivingRoom 0.210
    Bedroom

    Industrial

    Store

    Kitchen
    Office 0.870
    LivingRoom

    LivingRoom

    LivingRoom

    Kitchen
    Industrial 0.410
    Street

    OpenCountry

    Store

    TallBuilding
    Suburb 0.830
    OpenCountry

    InsideCity

    OpenCountry

    OpenCountry
    InsideCity 0.490
    LivingRoom

    Street

    TallBuilding

    Street
    TallBuilding 0.670
    Street

    Highway

    Mountain

    Industrial
    Street 0.560
    Mountain

    TallBuilding

    Industrial

    Industrial
    Highway 0.790
    Mountain

    Industrial

    Coast

    TallBuilding
    OpenCountry 0.470
    Mountain

    Forest

    Forest

    Suburb
    Coast 0.700
    Suburb

    OpenCountry

    OpenCountry

    OpenCountry
    Mountain 0.580
    TallBuilding

    OpenCountry

    Forest

    Street
    Forest 0.820
    TallBuilding

    Mountain

    Industrial

    Mountain
    Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label

    As shown here, bag of SIFT + 1 vs all linear SVM gives the best results in my experiement. I also learnt that Naive Bayes Nearest Neighbours, though simple to apply, shows good improvement on the traditional 1-NN algorithm.