In this project I have experimented with image recognition. I have written and anlyzed few different algorithms in this area that goes from simple methods like tiny images and nearest neighbor classification to more advanced methods like bags of quantized local features and linear classifiers learned by support vector machines. This report is divided in four parts:
Let's go through each of them:
K | Accuracy(%) |
---|---|
1 | 22.5 |
3 | 21.5 |
5 | 21.3 |
7 | 21.1 |
9 | 21.1 |
11 | 21.8 |
13 | 22.3 |
15 | 22.5 |
Parameter | Description |
---|---|
Step size | Higher this is is more densely we are sampling SIFT descriptors |
Features per image | Number of samples randomly picked up from SIFT descriptors |
Vocab size | Number of clusters formed in K-means |
K | Number of nearest neighbors in KNN |
K | Accuracy(%) |
---|---|
1 | 50.4 |
3 | 50.3 |
5 | 51.9 |
7 | 51.8 |
9 | 51.6 |
11 | 50.6 |
15 | 50 |
K | Accuracy(%) |
---|---|
1 | 48.1 |
3 | 49.1 |
5 | 51.1 |
7 | 51.7 |
9 | 51.9 |
11 | 51.2 |
15 | 50.9 |
K | Accuracy(%) |
---|---|
1 | 43.7 |
3 | 43.1 |
5 | 44.3 |
7 | 44.4 |
9 | 43.2 |
11 | 43.8 |
15 | 43.7 |
Parameter | Description |
---|---|
Step size | Higher this is is more densely we are sampling SIFT descriptors |
Features per image | Number of samples randomly picked up from SIFT descriptors |
Vocab size | Number of clusters formed in K-means |
LAMBDA | It controls the regularization for the model parameters while learning |
LAMBDA | Accuracy(%) |
---|---|
0.00001 | 61.9 |
0.0001 | 62.4 |
0.001 | 56.7 |
0.01 | 41.9 |
0.1 | 43.9 |
1 | 30.7 |
10 | 37.3 |
LAMBDA | Accuracy(%) |
---|---|
0.00001 | 58.3 |
0.0001 | 60.3 |
0.001 | 54.9 |
0.01 | 46.7 |
0.1 | 43.9 |
1 | 38.2 |
10 | 39.1 |
LAMBDA | Accuracy(%) |
---|---|
0.00001 | 49.9 |
0.0001 | 50.5 |
0.001 | 48.2 |
0.01 | 41.5 |
0.1 | 36.7 |
1 | 37.5 |
10 | 38.4 |
Look for vocab_{size}.mat files in the code directory
I experimented with Bag of SIFT features and SVM learners. From the table we can see that on increasing the vocab size, the performance improved till a certain vocab size after which it started degrading slightly in addition to increased runtime.
Vocab size | Accuracy(%) |
---|---|
50 | 55.3 |
100 | 60.3 |
200 | 61.4 |
300 | 62.1 |
400 | 61.7 |
500 | 60.4 |
1000 | 59.5 |
Code in run_cross_validation.m
I ran this 10 times. Each time I took 100 samples per class randomly for both train and test sets. Then ran my baseline mathod of bag of sifts with svm. The result seems to be pretty stable on my final tuned parameters.
Mean accuracy | 58.39 |
Standard deviaton | 2.69 |
Run number | Accuracy(%) |
---|---|
1 | 55.6 |
2 | 59.3 |
3 | 62.6 |
4 | 58.3 |
5 | 61.8 |
6 | 60.7 |
7 | 55.2 |
8 | 56.9 |
9 | 55.6 |
10 | 57.9 |
Code in run_validation_based_tuning.m
I experimented with my simple SVM classifier to tune its regularization parameter based on a validation set. I pick my validation set from the training set itself. I 'm doing a 10-fold validation.
Final unused test set accuracy: 60.3
LAMBDA | Mean accuracy(%) | Standard deviation |
---|---|---|
0.000001 | 52.9 | 4.25 |
0.00001 | 55.3 | 4.65 |
0.0001 | 57.1 | 4.26 |
0.001 | 55.7 | 4.05 |
0.01 | 41.4 | 3.75 |
0.1 | 39.5 | 3.78 |
1 | 30.6 | 3.68 |
10 | 35.6 | 2.64 |
Code in get_kernel_codebook.m
I have used 'soft assignment' using kernel codebook encoding for bag of words. In this method, assuming a gaussian distribution at each cluster a feature will be assigned to serveral clusters but with a weight that will be proportional to the distance. It performed better than baseline SVM+BOS, but not with significant improved as baseline was 60%. I tried with various sigma and found this as the best value.
Sigma = 210
LAMBDA | Accuracy(%) |
---|---|
10 | 37.3 |
1 | 30.7 |
0.1 | 43.9 |
0.0001 | 49.7 |
0.00001 | 59.5 |
0.000001 | 64.2 |
0.0000001 | 65.3 |
Code in get_fisher_vectors.m
In addition to keeping the count of descriptors per word, it also encodes additional information about the distribution of the descriptors. As we can see from the results it improved the accuracy significantly from our baseline SVM+BOS which was 60%.
Number of clusters | Accuracy(%) |
---|---|
10 | 72.0 |
10 | 74.3 |
50 | 76.1 |
Number of clusters | Accuracy(%) |
---|---|
10 | 70.3 |
30 | 72.1 |
50 | 75.3 |
Code in get_chisqr_classify.m and get_rbf_classify.m
Here, I got good results with Chi-square of around 63.1% but with RBF, the results were not as good, came out to be around 53.2%
Code in get_spatial_pyramid.m
Here I have tried to analyze effects of adding spatial information to the features by creating a grid with varying granularity over the image, therefor making coarser grids. At any fixed level, two points are said to match if they fall into the same cell of the grid. The results are better at higher level as we can see from the results.
Step size | Pyramid level | Accuracy(%) |
---|---|---|
16 | 0 | 55.1 |
16 | 1 | 58.8 |
16 | 2 | 62.6 |
16 | 3 | 63.9 |
8 | 3 | 69.7 |
4 | 3 | 70.9 |
Although I achieved an accuracy of 76.1 with Step size 4, I would selected step size of 8 for displaying my results as it is easy to reproduce given the time it takes to run. Accuracy = 72.5%, Step size = 8, Fisher
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.590 | LivingRoom |
Bedroom |
Industrial |
Bedroom |
||||
Store | 0.720 | LivingRoom |
Industrial |
Industrial |
LivingRoom |
||||
Bedroom | 0.470 | Kitchen |
LivingRoom |
Coast |
Store |
||||
LivingRoom | 0.490 | Bedroom |
Bedroom |
Bedroom |
Store |
||||
Office | 0.930 | LivingRoom |
Kitchen |
Bedroom |
LivingRoom |
||||
Industrial | 0.710 | Store |
Store |
Store |
Kitchen |
||||
Suburb | 0.990 | OpenCountry |
InsideCity |
InsideCity |
|||||
InsideCity | 0.710 | Street |
Highway |
TallBuilding |
Kitchen |
||||
TallBuilding | 0.790 | Industrial |
Street |
Street |
Office |
||||
Street | 0.670 | InsideCity |
TallBuilding |
Industrial |
InsideCity |
||||
Highway | 0.820 | OpenCountry |
InsideCity |
InsideCity |
Coast |
||||
OpenCountry | 0.530 | Mountain |
Coast |
Coast |
Coast |
||||
Coast | 0.780 | OpenCountry |
OpenCountry |
Mountain |
Mountain |
||||
Mountain | 0.810 | Coast |
Industrial |
Highway |
Coast |
||||
Forest | 0.860 | TallBuilding |
OpenCountry |
Street |
OpenCountry |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |