The goal of this project is to classify scenes into one of 15 categories by training and testing on the 15 scene database using different image presentation and and classifier.
The "tiny image" feature is one of the simplest possible image representations. I resize each image to 16x16, then make it zero mean and unit length as a representation.
Next, calculate the L2 distance between every test image and every train image. And find the nearest train image and assgin the label to the test image.
Accuracy is 0.224
Build a vocabulary of visual words by extracting SIFT features from training set with step size=15, then clustering them into 200 visual words with kmeans.
Represent training and testing images as histograms of visual words. For each image we will densely sample many SIFT descriptors with step size=5. Instead of storing hundreds of SIFT descriptors, we simply count how many SIFT descriptors fall into each cluster in our visual word vocabulary. This is done by finding the nearest neighbor kmeans centroid for every SIFT feature.
Use 1-NN to classify each test image
Accuracy is 0.514
Build a vocabulary of visual words by extracting SIFT features from training set with step size=15, then clustering them into n visual words with kmeans.
Represent training and testing images as histograms of visual words. For each image we will densely sample many SIFT descriptors with step size=5. Instead of storing hundreds of SIFT descriptors, we simply count how many SIFT descriptors fall into each cluster in our visual word vocabulary. This is done by finding the nearest neighbor kmeans centroid for every SIFT feature.
Use Linear SVM to classify each test image (Lambda=1e-6)
Vocab_size | 10 | 20 | 50 | 100 | 200 | 400 | 1000 |
---|---|---|---|---|---|---|---|
Accuracy | .394 | .499 | .598 | .655 | .664 | .699 | .713 |
Accuracy is 0.713
partition image evenly, build one histogram for each patch, then concatenate them together.
Vocab_size | 100 | 200 | 400 | 500 | 1000 |
---|---|---|---|---|---|
Accuracy(2 levels) | .709 | .736 | .743 | .749 | .749 |
Accuracy(3 levels) | .733 | .749 | .744 | .751 | .745 |
Instead of building one histogram for each image, partition each image in each level and build 1+4+16... histograms for each image and concatenate them together
Vocab_size | 100 | 200 | 400 | 500 | 1000 |
---|---|---|---|---|---|
Accuracy(2 levels) | .741 | .751 | .763 | .771 | .762 |
Accuracy(3 levels) | .751 | .757 | .757 | .749 | .741 |
Accuracy is 0.771
Accuracy is 0.821
Because the algorithm runs too slow, I have to sample SIFT features less. With stepSize 35,40,50
stepSize | 35 | 40 | 50 |
---|---|---|---|
Accuracy bag of SIFT+SVM | .447 | .401 | .348 |
Accuracy SIFT+NBNN | .518 | .514 | .383 |
Accuracy is 0.518
NBNN performs slightly better than bag of SIFT+SVM with the same stepSize. If I can sample features denser, I think I will get a much better result on NBNN
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.730 | Bedroom |
LivingRoom |
Office |
Industrial |
||||
Store | 0.830 | Industrial |
Industrial |
InsideCity |
InsideCity |
||||
Bedroom | 0.590 | LivingRoom |
LivingRoom |
Industrial |
TallBuilding |
||||
LivingRoom | 0.570 | Bedroom |
Store |
Bedroom |
Bedroom |
||||
Office | 0.980 | Kitchen |
LivingRoom |
Kitchen |
Bedroom |
||||
Industrial | 0.820 | Bedroom |
Bedroom |
Suburb |
Store |
||||
Suburb | 1.000 | Store |
OpenCountry |
||||||
InsideCity | 0.930 | TallBuilding |
Street |
TallBuilding |
Street |
||||
TallBuilding | 0.870 | Industrial |
Bedroom |
Forest |
Mountain |
||||
Street | 0.810 | TallBuilding |
TallBuilding |
Highway |
InsideCity |
||||
Highway | 0.850 | Street |
Street |
Coast |
OpenCountry |
||||
OpenCountry | 0.590 | Coast |
Mountain |
Coast |
Coast |
||||
Coast | 0.890 | Highway |
OpenCountry |
OpenCountry |
Mountain |
||||
Mountain | 0.900 | OpenCountry |
OpenCountry |
OpenCountry |
Forest |
||||
Forest | 0.960 | OpenCountry |
OpenCountry |
Mountain |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |