Relative Attributes

Relative Attributes

Marr Prize (Best Paper Award) Winner, ICCV 2011

"Who in the rainbow can draw the line where the violet tint ends and the orange tint begins? Distinctly we see the difference of the colors, but where exactly does the one first blendingly enter into the other? So with sanity and insanity."

-- Herman Melville, Billy Budd

[paper] [data] [code] [demos] [slides] [talk (video)] [poster] [translations]

Abstract

Human-nameable visual "attributes" can benefit various recognition tasks. However, existing techniques restrict these properties to categorical labels (for example, a person is 'smiling' or not, a scene is 'dry' or not), and thus fail to capture more general semantic relationships. We propose to model relative attributes. Given training data stating how object/scene categories relate according to different attributes, we learn a ranking function per attribute. The learned ranking functions predict the relative strength of each property in novel images. We then build a generative model over the joint space of attribute ranking outputs, and propose a novel form of zero-shot learning in which the supervisor relates the unseen object category to previously seen objects via attributes (for example, 'bears are furrier than giraffes'). We further show how the proposed relative attributes enable richer textual descriptions for new images, which in practice are more precise for human interpretation. We demonstrate the approach on datasets of faces and natural scenes, and show its clear advantages over traditional binary attribute prediction for these new tasks.

Motivation

Binary attributes are restrictive and can be unnatural. In the above examples, while one can characterize the image on the top-left and top-right as natural and man-made respectively, what would you describe the image in the top-center as? The only meaningful way to characterize it is with respect to the other images: it is less natural than the image on the left, but more so than the image on the right.

Proposal

In this work, we propose to model relative attributes. As opposed to predicting the presence of an attribute, a relative attribute indicates the strength of an attribute in an image with respect to other images. In addition to being more natural, relative attributes offer a richer mode of communication, thus allowing access to more detailed human supervision (and so potentially higher recognition accuracy), as well as the ability to generate more informative descriptions of novel images.

We devise an approach that learns a ranking function for each attribute, given relative similarity constraints on pairs of examples (or more generally a partial ordering on some examples). The learned ranking function can estimate a real-valued rank for images indicating the relative strength of the attribute presence in them.

We introduce novel forms of zero-shot learning and image description that exploit the relative attribute predictions.

Approach

Learning relative attributes: Each relative attribute is learnt via a learning to rank formulation, given comparative supervision, as shown below:

Distinction between learning a wide-margin ranking function (right) that enforces the desired ordering on training points (1-6), and a wide-margin binary classiﬁer (left) that only separates the two classes (+ and -), and does not necessarily preserve a desired ordering on the points is shown below:

Novel zero-shot learning: We study the following set-up

N total categories: S seen categories (associated images are available) + U unseen categories (no images are available for these categories)
S seen categories are described relative to each other via attributes (not all pairs of categories need to be related for all attributes)
U unseen categorires are described relative to (a subset of) seen categories in terms of (a subset of) attributes.

We first train a set of relative attributes using the supervision provided on the seen categories. These attributes can also be pre-trained from external data. We then build a generative model (Gaussian) for each seen category using the responses of the relative attributes to the images from the seen categories. We then infer the parameters of the generative models of unseen categories by utilizing their relative descriptions with respect to seen categories. A visualization of the simple approach we employ for this is shown below:

A test image is assigned to the category with the maximum likelihood.

Automatically generating relative textual descriptions of images: Given an image I to be described, we evaluate all learnt ranking functions on I. For each attribute, we identify two reference images lying on either side of I, and are not too far from or too close to I. Image I is then described relative to these two reference images, as shown below:

As seen above, in addition to describing an image relative to other images, our approach can also describe an image relative to other categories, resulting in a purely textual description. Clearly, the relative descriptions are more precise and informative than the conventional binary description.

Experiments and Results

We conduct experiments on two datasets:

(1) Outdoor Scene Recognition (OSR) containing 2688 images from 8 categories: coast C, forest F, highway H, inside-city I, mountain M, open-country O, street S and tall-building T. We use gist features to represent the images.

(2) A subset of the Public Figures Face Database (PubFig) containing 772 images from 8 categories: Alex Rodriguez A, Clive Owen C, Hugh Laurie H, Jared Leto J, Miley Cyrus M, Scarlett Johansson S, Viggo Mortensen V and Zac Efron Z. We use concatenated gist and and color features to represent the images.

The list of attributes used for each dataset, along with the binary and relative attribute annotations are shown below:

Zero-shot learning:

We compare our proposed approach to two baselines. The first is Score-based Relative Attributes (SRA). This baseline is the same as our approach, except it uses the scores of a binary classifier (binary attributes) instead of the scores of a ranking function. This baseline helps evaluate the need for a ranking function to best model relative attributes. Our second baseline is the Direct Attribute Prediction (DAP) model introduced by Lampert et al. in CVPR 2009. This baseline helps evaluate the benefits of relative treatment of attributes as opposed to categorical. We evaluate these approaches for varying numbers of unseen categories, varying amounts of data used to train the attributes, varying number of attribtues used to describe the unseen categories, and varying levels of 'looseness' in the description of unseen categories. Details of the experimental set-up can be found in our paper. Results are shown below:

Auto-generated image descriptions:

In order to evaluate the quality of our relative image descriptions to the binary counterparts, we conducted a human study. We generated a description of an image using our approach, as well as the baseline binary attributes. We presented the subjects with this description, along with three images. One of the three images was the image being described. The task of the subjects was to rank the three images based on which one they thought was most likely to be the one being described. The more precise the description, the better the chances subjects have of identifying the correct image. An illustration of a task presented to subjects is shown below:

The results of the study are shown below. We see that subjects can identify the correct image more accurately using our proposed relative attributes, as compared to the binary attributes.

Example binary descriptions of images as well as descriptions relative to categories are shown below:

Image	Binary descriptions	Relative descriptions
	not natural not open perspective	more natural than tallbuilding, less natural than forest more open than tallbuilding, less open than coast more perspective than tallbuilding
	not natural not open perspective	more natural than insidecity, less natural than highway more open than street, less open than coast more perspective than highway, less perspective than insidecity
	natural open perspective	more natural than tallbuilding, less natural than mountain more open than mountain less perspective than opencountry
	White not Smiling VisibleForehead	more White than AlexRodriguez more Smiling than JaredLeto, less Smiling than ZacEfron more VisibleForehead than JaredLeto, less VisibleForehead than MileyCyrus
	White not Smiling not VisibleForehead	more White than AlexRodriguez, less White than MileyCyrus less Smiling than HughLaurie more VisibleForehead than ZacEfron, less VisibleForehead than MileyCyrus
	not Young BushyEyebrows RoundFace	more Young than CliveOwen, less Young than ScarlettJohansson more BushyEyebrows than ZacEfron, less BushyEyebrows than AlexRodriguez more RoundFace than CliveOwen, less RoundFace than ZacEfron

Data

We provide the learnt relative attributes and their predictions for the two datasets used in our paper: Outdoor Scene Recognition (OSR) and a subset of the Public Figures Face Database (PubFig).

README

Download (v2)

Relative Face Attributes Dataset. It contains annotations for 29 relative attributes on 60 categories from the Public Figures Face Database (PubFig).

Code

We modified Olivier Chappelle's RankSVM implementation to train relative attributes with similarity constraints. Our modified code can be found here.

If you use our code, please cite the following paper:

D. Parikh and K. Grauman

Relative Attributes

International Conference on Computer Vision (ICCV), 2011.

Demos

Demos of various applications of relative attributes can be found here. A description of these applications can be found in the papers here.

Publications

D. Parikh and K. Grauman

Relative Attributes

International Conference on Computer Vision (ICCV), 2011. (Oral)

Marr Prize (Best Paper Award) Winner

[slides] [talk (video)] [poster] [relative descriptions demo]

Following are our other papers that use relative attributes:

A. Biswas and D. Parikh

Simultaneous Active Learning of Classifiers & Attributes via Relative Feedback

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013

[project page and data] [poster] [demo]

A. Parkash and D. Parikh

Attributes for Classifier Feedback

European Conference on Computer Vision (ECCV), 2012 (Oral)

[slides] [talk (video)] [project page and data] [demo]

A. Kovashka, D. Parikh and K. Grauman
WhittleSearch: Image Search with Relative Attribute Feedback
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
[project page] [poster] [demo]

D. Parikh, A. Kovashka, A. Parkash and K. Grauman
Relative Attributes for Enhanced Human-Machine Communication (Invited paper)
AAAI Conference on Artificial Intelligence (AAAI), 2012 (Oral)

Translations

Several translations of this page are available. The content is generated by independent sources, and has not been verified by us. Consult at your own discretion.

Romanian by science team.

Slovakian by Sciologness Team.

Russian by Alexander Nikiforov.

Indonesian by ChameleonJohn.com.

Polish by Grzegorz Adamski.

Czech by StudyCrumb.

Ukrainian by StudyBounty.

Portuguese by Artur Weber.

Estonian by Write My Paper For Me

French by Avice Robitaille.

German by Philip Egger.

Dutch by ProThesisWriter