Given a set of training and testing images, the recognition pipeline:
vl_dsift
) and using k-means clustering (vl_kmeans
) to quantize the SIFT descriptor space. Therefore with a vocabulary of 50 visual words and 100 SIFT descriptors detected in the input image, the BoS representation would be a histogram of 50 bins counting the number of times each SIFT descriptor was assigned to each visual word (i.e. cluster center). "Soft assignment" can be used to assign each descriptor to the p-closest visual words with contributions weighted by distance from the cluster centers. Afterwards, the histogram is normalized so that the image size does not affect the feature magnitude.
vl_fisher
), the visual vocabulary is obtained with a Gaussian Mixture Model (vl_gmm
) constructed from densely sampled SIFT descriptors.
The recognition pipeline was trained and tested on the 15 scene database from Lazebnik et al. 2006.
Testing accuracy is reported for the following pipeline combinations:
Recognition accuracy drops off after k > 3 nearest neighbors. |
With s = 7 and k large (i.e. 50), majority vote is "Building" for most categories. |
When s < 7, high frequencies are lost and the tiny images become indistinguishable.
As s grows > 7, the images become less likely to match pixel-for-pixel.
Increasing vocabulary size improves accuracy to a limit.
Peak accuracy is obtained with p between 2 and 4 .
Peak accuracy (68.7%) obtained when lambda equal to 1e-4.
Improved accuracy with p = 2.
Accuracy declines as p grows > 2.
Increasing number of visual words improves accuracy to a limit around 78%.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.650 | Store |
Bedroom |
Store |
Store |
||||
Store | 0.850 | Industrial |
Kitchen |
Street |
Kitchen |
||||
Bedroom | 0.560 | LivingRoom |
LivingRoom |
Kitchen |
Kitchen |
||||
LivingRoom | 0.580 | Bedroom |
InsideCity |
Bedroom |
Kitchen |
||||
Office | 0.960 | Store |
Kitchen |
Bedroom |
Bedroom |
||||
Industrial | 0.700 | Highway |
Bedroom |
Store |
Mountain |
||||
Suburb | 0.990 | TallBuilding |
Industrial |
InsideCity |
|||||
InsideCity | 0.750 | Store |
Kitchen |
Suburb |
OpenCountry |
||||
TallBuilding | 0.760 | InsideCity |
InsideCity |
Street |
LivingRoom |
||||
Street | 0.800 | TallBuilding |
InsideCity |
Highway |
InsideCity |
||||
Highway | 0.880 | OpenCountry |
Store |
Bedroom |
Industrial |
||||
OpenCountry | 0.600 | Coast |
Mountain |
Coast |
Forest |
||||
Coast | 0.800 | OpenCountry |
OpenCountry |
Highway |
OpenCountry |
||||
Mountain | 0.860 | Forest |
OpenCountry |
Suburb |
Forest |
||||
Forest | 0.940 | OpenCountry |
Highway |
Mountain |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |