Project 4 / Scene Recognition with Bag of Words

I implemented Image Recognition using two different methods of image representation: Tiny Images and Bag of SIFT Features, and two different methods of classification: Nearest Neighbor and Linear SVM

Tiny Images and Nearest Neighbor

Tiny Images is a simple image representation method in which you resize the image to a 16x16 image and translate that matrix into a 1x126 vector.

Nearest Neighbor is a classifaction method in which the nearest training image for each test image is found and the training image's label is assigned to the test image.

Using the combination of Tiny Images and Nearest Neighbor, I got an accuracy of 20.1%. The results can be found here.

Bag of SIFT Features and Nearest Neighbor

Bag of SIFT Features is a method of image representation. I first created a vocabulary of classifying words by sampling the features of the training set images and clustering them with k-means (each cluster correlates to a word in the vocabulary). Once I was done building the vocabularly, I made a histogram of the SIFT features of the test image and used Nearest Neighbor to classify the image.

Using the combination of Bag of SIFT Features and Nearest Neighbor, I got an accuracy of 45.8%. I used a step size of 3, bin size of 8, lambda of 0.0001, and a sample size of 200. The results can be found here.

Bag of SIFT Features and Linear SVM

A Linear SVM is used to determine whether an image belongs to a particular category. It, however, can not determine which category the image belongs to. In order to do this, I trained a Linear SVM for each category. The category that returned the highest confidence to the image was the category that I assigned for the image.

Using the combination of Bag of SIFT Features and Linear SVM, I got an accuracy of 58.0%. I used a step size of 3, bin size of 8, lambda of 0.0001, and a sample size of 200. The results can be found here.