Fig 1. Confusion Matrix generated by SIFT and GIST descriptors. Best result I gain in this project.
In this project, we perform scenen recognition for a 15 scene database. We try to use tiny images representation and bag of SIFT representation for feature extraction. And we test the effectness both of KNN and SVM in this project for scene classification. After implementing those algorithms individually, we need to test their performance with three different combinations. Below shows the results we gain before applying advanced steps in extra part. (vocabulary size: 200, K in KNN: 5, LAMBDA: 0.0001)
For this task, I generate gist descriptor of each image and append it to the normalized sift descriptor. All the parameters setting keep the same.The gist feature descriptor refers to http://people.csail.mit.edu/torralba/code/spatialenvelope/ . Adding the gist features can boost up the classification accuracy to 70.8%, which is the best performance I gain. So i show the confusion matrix in Fig 1. To test the combination of sift and gist feature, you need to change the set up to be(The table of classifier results will be shown in the latter part of this report)
%code for this part is in get_bags_of_gists
FEATURE = 'sift and gist';
For this task, I do experiment on vocabulary sizes of 10, 20, 50, 100, 200, 400. And I use the sift descriptors and all-SVM for classification. The corresponding accuracy values are shown in Fig 2. We find that the accuracy tends to converge after the size of vocabulary become larger. Since lager vocal_size will lead to longer processing time, it is important to select an appropriate vocal_size considering the tradeoff bewteen accuracy and effeciency.
Fig 2. Accuracy gained by bag of SIFT + 1 vs all linear SVM with different vocabulary sizes.
For this task, I train the SVM using chi-sqr kernel. Following the instruction provided in http://www.vlfeat.org/overview/svm.html, I first construct a chi-sqr dataset based for the training data. Then I feed the new dataset to vl_svmtrain. The code is provided below
Confusion matrix and the table of classifier results
%training for the SVM with chi-sqr
hom.kernel = 'KChi2';
hom.order = 2;
dataset = vl_svmdataset(train_image_feats', 'homkermap', hom);
[W, B] = vl_svmtrain(dataset, train_label', LAMBDA);
%calculate score when the jth item in ith category
score=(ws{i}')*(vl_homkermap(test_image_feats(j,:)', 2))+bs{i};
Fig 3. Confusion Matrix gained with bag of sift and SVM trained with chi-sqr.
To get the best result, I get the GIST descriptors and use them together with the normalized SIFT descriptor as image features. The code for generating this mix features is in get_sift_and_gist.m. I use linear SVM for it. To reproduce it. And you should set parameters to be the following values in seperate file.
%In proj4.m
FEATURE = 'sift and gist';
CLASSIFIER = 'support vector machine';
vocab_size = 200;
%In svm_classify.m
LAMBDA=0.0001;
Accuracy (mean of diagonal of confusion matrix) is 0.708
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.560 | LivingRoom |
LivingRoom |
LivingRoom |
Industrial |
||||
Store | 0.610 | Bedroom |
Forest |
Industrial |
Kitchen |
||||
Bedroom | 0.480 | LivingRoom |
Kitchen |
Store |
LivingRoom |
||||
LivingRoom | 0.490 | Bedroom |
Bedroom |
Industrial |
Street |
||||
Office | 0.890 | Bedroom |
TallBuilding |
LivingRoom |
Bedroom |
||||
Industrial | 0.690 | TallBuilding |
Bedroom |
Suburb |
Street |
||||
Suburb | 0.940 | Store |
Mountain |
InsideCity |
Industrial |
||||
InsideCity | 0.570 | LivingRoom |
Highway |
Store |
Street |
||||
TallBuilding | 0.780 | Industrial |
Industrial |
LivingRoom |
LivingRoom |
||||
Street | 0.660 | Highway |
Mountain |
Industrial |
Store |
||||
Highway | 0.800 | OpenCountry |
Street |
Coast |
Mountain |
||||
OpenCountry | 0.620 | Mountain |
Industrial |
Coast |
Highway |
||||
Coast | 0.800 | OpenCountry |
Highway |
OpenCountry |
Mountain |
||||
Mountain | 0.820 | Kitchen |
Coast |
OpenCountry |
Highway |
||||
Forest | 0.910 | Store |
OpenCountry |
Mountain |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |