Project 4 : Scene Recognition with Bag of Words

Introduction and Background

This project explores the use of different feature encoding and classification methods for scene recognition of static images. Features within an image can be encoded using varying types of descriptors. In this project three descriptor types will be examined: (i) tiny images, (ii) Bags of SIFT descriptors, and (iii) Bags of Fisher vectors. These descriptors are then used for two calssification methods: (i) nearest neighbor, and (ii) support vector machine.

Feature Encoding

Tiny Images

One of the simplest methods of encoding features is to downsample all images to a fixed low resolution. This provides unique descriptors for each image at low memory costs. However, there is a loss of information in the high-frequency domain (textural and pattern information) which can be essential to scene recognition. The results presented for varying pipeline performance show that using tiny images does not provide accurate classification. Cases where it is moderately succesful are for those that are not very similar to other cases (e.g. highways).

SIFT Descriptors

Constructing a bag of feature descriptors first requires building a visual 'vocabulary' through the use of k-means clustering on SIFT features. This results in k regions with known centroid locations. These locations can then be used to bin every new SIFT feature into different regions, through a nearest neighbor algorithm with L2-distance metric.

Fischer Vectors

Using a similar process to the construction of bags of SIFT descriptors, Fischer vectors can be used to construct bags of feature descriptors as well. Fischer vectors encode information about the average and covariance between a SIFT descriptor and centres of a Gaussian mixed model (GMM). This provides a more rich method of encoding, at the cost of increased computation.

Classifiers

Nearest Neighbor

The nearest neighbor method where image labels are determined by majority vote of its neighbors. That is, if the majority of the neighboring images are associated with a label then the test image is classified with that label.

Support Vector Machine

Support vector machines (SVMs) are supervised learning models which can be used to classification problems. This project only explores linear SVMs, however nonlinear models do exist and can provide better performance for cases where linear models provide a poor approximation. Since the problem being explored is one with multiple classes and the linear classifier is binary, multiple linear classifiers need to be trained that compare 'label' to 'not-label'. Given an test image, the linear classifier which indicates greatest confidence (highest positive value) will determine the image label.

Classifier Pipeline Performance

  1. Tiny images + Nearest neighbor: 20.3%
  2. SIFT + Nearest neighbor: 43.0%
  3. SIFT + Linear SVM: 69.1%

    Scene classification results visualization


    Accuracy (mean of diagonal of confusion matrix) is 0.691

    Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label
    Kitchen 0.690
    Bedroom

    Store

    Office

    Bedroom
    Store 0.580
    LivingRoom

    InsideCity

    Forest

    LivingRoom
    Bedroom 0.430
    Kitchen

    InsideCity

    Kitchen

    Kitchen
    LivingRoom 0.330
    InsideCity

    InsideCity

    Bedroom

    Store
    Office 0.920
    Bedroom

    Kitchen

    Bedroom

    Kitchen
    Industrial 0.540
    Street

    Kitchen

    Bedroom

    LivingRoom
    Suburb 0.970
    Mountain

    Industrial

    Street

    InsideCity
    InsideCity 0.580
    Coast

    Kitchen

    Industrial

    Street
    TallBuilding 0.760
    InsideCity

    Store

    Street

    Industrial
    Street 0.680
    InsideCity

    InsideCity

    InsideCity

    InsideCity
    Highway 0.790
    OpenCountry

    Coast

    Suburb

    Coast
    OpenCountry 0.550
    Mountain

    Industrial

    Coast

    Suburb
    Coast 0.790
    OpenCountry

    OpenCountry

    OpenCountry

    Highway
    Mountain 0.820
    OpenCountry

    Highway

    Highway

    Street
    Forest 0.940
    Mountain

    Industrial

    Store

    Mountain
    Category name Accuracy Sample training images Sample true positives False positives with true label False negatives with wrong predicted label