This was a fairly straightforward pipeline to implement. To create the features, I simply resized each image to a 16x16 resolution. For the nearest neighbor classifier, I used the single nearest neighbor (1-NN algorithm). The accuracy that I got was 19.1%, which was close to triple the performance of random chance, and within the parameters specified by the project guidelines. This whole pipeline runs in roughly 2 minutes. The confusion matrix is shown below. It looks like the algorithm tended to guess the "outdoor" images more than others.
In this section, the bag of words model was implemented. First, I implemented the vocabulary builder. This section took the sift features from the training images (using a step size parameter of 3, for better accuracy) and clustered them together to create the words. I stuck with the default 200 words for the vocabulary size. Getting the bags of words for each image was probably the most complicated piece of the project. First, my code finds the sift feature in a test image, it then determines which word that feature most closely resembles (similarly to how the nearest neighbor algorithm itself operates), and then creates a normalized histogram of word frequency. I used a step size of 16 here on my best run (smaller step sizes took far too long to run, while large step sizes were less accurate - for example, a step size of 100 led to an accuracy of less than .2), which produced an accuracy of .417. The run time was around 35 minutes.
The new piece of implementation for the third pipeline was the SVM classifier. Since the SVM was a binary classifier, there needed to be 15 of them - one for each category with a "yes" or "no" label. Through experimentation, I arrived at a lambda of .000001. This classifier increased performance to .476, again with a run time of around 35 minutes. The results are listed at the bottom of this page. This pipeline also performed better on the outdoor images. For example, it guessed forest correctly 83% of the time.
I was disappointed that I wasn't able to get my metrics (run-time and accuracy) within the parameters outlined in the project description. The run times were what held me back, as I wasn't able to experiment with parameters as much as I would've liked too (although I was able to do this to some extent). The accuracies, however, were directionally correct. I was surprised that the perfomance was even this high though, and I enjoyed this look at the foundations of combining computer vision and machine learning.
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label | ||||
---|---|---|---|---|---|---|---|---|---|
Kitchen | 0.150 | Industrial |
Industrial |
LivingRoom |
Mountain |
||||
Store | 0.260 | LivingRoom |
Industrial |
Industrial |
Suburb |
||||
Bedroom | 0.570 | Kitchen |
Kitchen |
LivingRoom |
LivingRoom |
||||
LivingRoom | 0.260 | Bedroom |
Bedroom |
Bedroom |
Office |
||||
Office | 0.640 | Bedroom |
LivingRoom |
LivingRoom |
LivingRoom |
||||
Industrial | 0.260 | Kitchen |
TallBuilding |
Bedroom |
Highway |
||||
Suburb | 0.550 | Coast |
LivingRoom |
Store |
LivingRoom |
||||
InsideCity | 0.530 | Store |
Kitchen |
Office |
Kitchen |
||||
TallBuilding | 0.560 | Store |
InsideCity |
InsideCity |
Store |
||||
Street | 0.320 | TallBuilding |
Highway |
Kitchen |
Suburb |
||||
Highway | 0.690 | Mountain |
OpenCountry |
Coast |
Mountain |
||||
OpenCountry | 0.310 | Coast |
Suburb |
Street |
Highway |
||||
Coast | 0.600 | OpenCountry |
OpenCountry |
OpenCountry |
Suburb |
||||
Mountain | 0.610 | OpenCountry |
OpenCountry |
OpenCountry |
OpenCountry |
||||
Forest | 0.830 | OpenCountry |
Street |
TallBuilding |
Mountain |
||||
Category name | Accuracy | Sample training images | Sample true positives | False positives with true label | False negatives with wrong predicted label |