Project 4 / Scene Recognition with Bag of Words

Example of a right floating element.

The first part of the Scene Recognition pipeline that I worked on was the generation of vectorized images to serve as features. I felt that this part was fairly straightforward, as all I really did was resize the image, normalize it, and then convert put all the normalized numerical value into a vector. The KNN neighbors was a bit more interesting, and I went with only one neighbor. I was able to achieve an accuracy of .522 on KNN with the bag of sifts pipeline completed, and an accuracy of .2 with just tiny nearest neighbors. The confusion matrices are shown on the left and right hand sides, respectively.

Build Vocabulary

It was taking forever to build the vocabulary at first, so I decided to keep the 200 words parameter, but randomly sample 800 out of 1500 images. That let me debug the rest of the pipeline appropriately, and I was later able to increase the amount of words and not sample images randomly. Increasing the words actually proved detrimental to the overall runtime of both the classifcation and bagging models, and the slight increase in accuracy in the classification stage simply didn't justfy it. One thing that I had issues with for a while was being able to concatenate the SIFT features appropriately. I didn't really understand their representation in the output, and I later understood that I had to concatenate the new feature matrix to the old one horizontally to be able to cluster the features appropriately at the end. I ended up settling on 6 as rapidity wasn't as necessary when just building the vocabulary, and I thought that I wouldn't be undersampling either.

Bag of Sifts

I wasn't quite sure how to begin building the histogram, as I had completely forgotten that I could use the distance function to find the closest centroid given by vocab. I tried building a KD-tree, but that ended up proving to be more confusing than I thought it'd be in the first place. I looked at my nearest neighbors code and it suddenly clicked and I basically adapted that inside of my loop. I found that keeping 'fast' on wasn't very detrimental to the accuracy of the model. There is a time deadline, so If i didn't keep that on I'd have to increase the step sizes. When I increased those with 'fast' disabled, my accuracy was far worse when compared to the original configuration. I used a step size of 9 for the "fast" run of the SVM to yield an accuracy of 61 percent. The same was done for KNN and that yielded an accuracy of around 49%.


This is the part of the project that took me the longest, as I made a fatal mistake that held me back for hours on my SVM. I coded the algorithm for the voting correctly, but the problem was the initial confidence that I gave each SVM. I gave them a confidence of 0, so a lot of times the algorithm automatically classified something as the first label in categories. After I fixed that error, my accuracy went from around 50 percent to 64 percent. I achieved an accuracy of 64 percent with a lambda of .0001 in 8 minutes. The confusion matrix is shown in the results page.


As stated in the readme, I included the detailed results for the performance of all of the algorithms in the index.html page of three, separate, appropriately named folders in this same html directory. I would have rathered included them here but I was getting some really funky as well as frustrating javascript errors.