Computer Vision Lab



Our Visual Humor (CVPR 2016) datasets can be found here.




Our Visual Question Annswering (VQA) dataset (ICCV 2015) can be found here



Fill-in-the-blank (FITB) and visual paraphrasing (VP) datasets and code from our "Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks" paper at CVPR 2015 can be found here    



CIDEr (Consensus-based Image Description Evaluation) code as well as the PASCAL-50S and ABSTRACT-50S datasets from our "Consensus-based Image Description Evaluation" paper at CVPR 2015 can be found here




Datasets and code from our "Zero-Shot Learning via Visual Abstraction" paper at ECCV 2014 can be found here





Attribute Dominance Dataset: Our dataset containing attribute dominance annotations for face and animal images used in our "Attribute Dominance: What Pops Out?" paper at ICCV 2013.




Spoken Attribute Dataset: Dataset used in our "Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing" paper at ICCV 2013.




Abstract Scenes Dataset: Our dataset containing >10k abstract scenes and coresponding descriptions used in the following two papers:

"Bringing Semantics Into Focus Using Visual Abstraction" paper at CVPR 2013 (Oral).

"Learning the Visual Interpretation of Sentences" paper at ICCV 2013.



Patches Dataset: We provide the patches used in our "The Role of Image Understanding in Contour Detection" paper at CVPR 2012. 




Relative Attributes Datasets: We provide the learnt relative attributes and their predictions for two datasets: Outdoor Scene Recognition (OSR: 6 attributes, 8 categories) and a subset of the Public Figures Face Database (PubFig: 11 attributes, 8 categories).   


Relative Face Attributes Dataset (29 attributes, 60 categories)


Relative Shoes Attributes Dataset (10 attributes, 10 categories)




Part Patch Dataset: A large number of local image patches classified by human subjects in isolation as containing a person's head, torso, arm, hand, leg or foot; and a large number of image windows classified by human subjects in isolation as containing a person or not. The patches and windows come from high-resolution color, gray-scale and normalized gradient images, as well as the low-resolution counterparts of the same.