Computer Vision Lab

 


        

Our Visual Humor (CVPR 2016) datasets can be found here.

 

humor

 



Our Visual Question Annswering (VQA) dataset (ICCV 2015) can be found here

   

   
 


Fill-in-the-blank (FITB) and visual paraphrasing (VP) datasets and code from our "Don't Just Listen, Use Your Imagination: Leveraging Visual Common Sense for Non-Visual Tasks" paper at CVPR 2015 can be found here    

   

     

CIDEr (Consensus-based Image Description Evaluation) code as well as the PASCAL-50S and ABSTRACT-50S datasets from our "Consensus-based Image Description Evaluation" paper at CVPR 2015 can be found here

   

   

             

Datasets and code from our "Zero-Shot Learning via Visual Abstraction" paper at ECCV 2014 can be found here

   

   

     

 

Attribute Dominance Dataset: Our dataset containing attribute dominance annotations for face and animal images used in our "Attribute Dominance: What Pops Out?" paper at ICCV 2013.

   

   

 

Spoken Attribute Dataset: Dataset used in our "Spoken Attributes: Mixing Binary and Relative Attributes to Say the Right Thing" paper at ICCV 2013.

 

   

 

Abstract Scenes Dataset: Our dataset containing >10k abstract scenes and coresponding descriptions used in the following two papers:

"Bringing Semantics Into Focus Using Visual Abstraction" paper at CVPR 2013 (Oral).

"Learning the Visual Interpretation of Sentences" paper at ICCV 2013.

  

 

Patches Dataset: We provide the patches used in our "The Role of Image Understanding in Contour Detection" paper at CVPR 2012. 

 

 

    

Relative Attributes Datasets: We provide the learnt relative attributes and their predictions for two datasets: Outdoor Scene Recognition (OSR: 6 attributes, 8 categories) and a subset of the Public Figures Face Database (PubFig: 11 attributes, 8 categories).   

 

Relative Face Attributes Dataset (29 attributes, 60 categories)

 

Relative Shoes Attributes Dataset (10 attributes, 10 categories)

 

    

 

Part Patch Dataset: A large number of local image patches classified by human subjects in isolation as containing a person's head, torso, arm, hand, leg or foot; and a large number of image windows classified by human subjects in isolation as containing a person or not. The patches and windows come from high-resolution color, gray-scale and normalized gradient images, as well as the low-resolution counterparts of the same.