For the tiny image descriptor I load in the images and resize them to be 16 by 16. I then create a 256 dimension descriptor using the resized image and normalize it.
To build the vocabulary sift features are extracted from each image and the centers are calculated using kmeans. The centers are then used as the vocabulary. To build the bag of sift features sift features are once again extracted from each image. The min distance between the features and the vocabulary are calculated and a histogram is built from the number of times each cluster in the vocab was used. This histogram becomes our new feature
For nearest neighbor I take the I find the minimum ditance between the training features and the test features. The image categories are then selected using the nearest neighbor on train_labels
strcmp is used to find which label match and category and then used to construct the binary labels needed to train the SVM. The SVM is trained on the training features with the binary labels we previously created. Training the SVM gives use a weight vector, W and an offset B. The formula W * test feature + b is used to calculate the confidences for the matches. The max of the confidences is taken to select most confident SVM.
To combine the gist and sift features I concatenated each gist feature for each image to the sift features of that same image. This gives a 604 by N feature.
For tiny images the the accuracy was 20%. For bag of sifts and nearest neighbor the accuracy was 48.6%. Bag of sifts with SVM gave 62.5% accuracy. The gist-sift combination feature performed marginally better with 64.7% accuracy. Running different cluster numbers had an interest effect.
The step size run for these all these vocabulary tests was 15 and 10 for the bag of sifts. Interestingly my accuracy seemded to decrease when using over 100 clusters.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.550 | Office |
Office |
Store |
Store |
||||
Store | 0.390 | InsideCity |
Bedroom |
LivingRoom |
Highway |
||||
Bedroom | 0.380 | Kitchen |
LivingRoom |
LivingRoom |
Kitchen |
||||
LivingRoom | 0.340 | Office |
InsideCity |
Store |
Industrial |
||||
Office | 0.850 | LivingRoom |
Bedroom |
Kitchen |
Kitchen |
||||
Industrial | 0.450 | Bedroom |
Store |
Street |
Office |
||||
Suburb | 0.900 | OpenCountry |
OpenCountry |
OpenCountry |
InsideCity |
||||
InsideCity | 0.570 | Kitchen |
Street |
LivingRoom |
LivingRoom |
||||
TallBuilding | 0.700 | InsideCity |
Store |
Store |
Mountain |
||||
Street | 0.540 | TallBuilding |
OpenCountry |
Store |
Highway |
||||
Highway | 0.770 | Mountain |
Industrial |
Coast |
InsideCity |
||||
OpenCountry | 0.490 | Mountain |
Highway |
Highway |
Mountain |
||||
Coast | 0.750 | Highway |
OpenCountry |
OpenCountry |
Mountain |
||||
Mountain | 0.770 | OpenCountry |
Street |
Suburb |
OpenCountry |
||||
Forest | 0.930 | Mountain |
OpenCountry |
OpenCountry |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |
There are several free variables in the implementation. Variables such as step size for the sift descriptor, and the lamda value in the SVM can have drastic changes on the accuracy acheived. The best Sift-SVM show above had a step of 10 for the vocabulary and a step of 5 for the bag of sifts. The lambda value was 0.000001.