An In Depth View of Saliency

Arridhana Ciptadi, Tucker Hermans, James M. Rehg


Input Color Image	Input Depth Image	Resulting Saliency Map	Resulting Object Segmentation

Abstract

Visual saliency is a computational process that identifies important locations and structure in the visual field. Most current methods for saliency rely on cues such as color and texture while ignoring depth information, which is known to be an important saliency cue in the human cognitive system. We propose a novel computational model of visual saliency which incorporates depth information. We compare our approach to several state of the art visual saliency methods and we introduce a method for saliency based segmentation of generic objects. We demonstrate that by explicitly constructing 3D layout and shape features from depth measurements, we can obtain better performance than methods which treat the depth map as just another image channel. Our method requires no learning and can operate on scenes for which the system has no previous knowledge. We conduct object segmentation experiments on a new dataset of registered RGB-D images captured on a mobile-manipulator robot.

Dataset

The dataset associated with this work consists of 80 color and depth image pairs with associated pixel level ground truth segmentation masks. The data were collected using a Microsoft Kinect sensor mounted on a Willow Garage PR2 robot in the Georgia Tech Aware Home. The data are available as an archive here.

The archive is organized as follows where * represents the index from 0 to 79:

input/color*.png : Color image captured from the Kinect

input/depth*.png : Raw depth image capture from the Kinect

input_cropped/depth*_crop_smooth.png : Cropped version of the depth image which has been smoothed to fill in holes

input_cropped/color*_crop.png : Color image cropped to align with cropped Kinect image

pointclouds/cloud*.pcl : Point cloud data corresponding to the depth map stored in the Point Cloud Data file format

pointclouds/cloud*.mat : Point cloud data corresponding to the depth map stored as Matlab matrix channels are X,Y,Z

annotated/*.png : Binary images with white pixels corresponding to human labeled objects

Note all point clouds are stored in the coordinate system of the robot torso frame. This is a right-handed coordinate frame where X is forward from the robot, Y is to the left, and Z points upwards.

We additionally provide results from our research in the archive here. For more information on the methods used please see our BMVC 2013 paper, "An In Depth View of Saliency," available here.

The results archive is organized as follows:

ciptadi/ciptadi_planes*.png	: Saliency maps for our proposed method with planes only (best performing for segmentation on our dataset) [1]
ciptadi/ciptadi_normals*.png	: Saliency maps for our proposed method with surface normals only [1]
ciptadi/ciptadi_normals_planes*.png	: Saliency maps for our proposed method with surface normals and planes [1]
others/context_aware*.png	: Saliency maps for our implementation of context aware [2]
others/context_aware_depth*.png	: Saliency maps for our implementation of context aware with depth [1]
others/center_surround*.png	: Saliency maps for our implementation of center surround saliency [3]
others/center_surround_depth*.png	: Saliency maps for our implementation of center surround saliency with depth [4]
others/signature*.png	: Saliency maps for our implementation of image signature saliency [5]
others/gbvs*.png	: Saliency maps for our implementation of graph based visual saliency [6]
others/spectral*.png	: Saliency maps for our implementation of spectral residual based visual saliency [7]

If you use any of these data please cite our BMVC 2013 paper. The corresponding bibtex entry is:

@inproceedings{ciptadi-bmvc2013,

author = {Arridhana Ciptadi and Tucker Hermans and James M. Rehg},
title = {{An In Depth View of Saliency}},
booktitle = {{British Machine Vision Conference (BMVC)}},
year = {2013},
month = {September},
location = {Bristol, United Kingdom}

}

References

[1] A. Ciptadi, T. Hermans, and J. M. Rehg. "An In Depth View of Saliency." BMVC, 2013.
[2] S. Goferman, L. Zelik-Manor, and A. Tal. "Context-Aware Saliency Detection." CVPR, 2010.
[3] L. Itti, C. Koch, and E. Niebur. "A model of saliency-based visual atten- tion for rapid scene analysis." PAMI, 1998.
[4] N. Ouerhani and H. Hüglie. "Computing visual attention from scene depth." ICPR, 2000.
[5] X. Hou, J. Harel, and C. Koch. "Image signature: Highlighting sparse salient regions." PAMI, 2012.
[6] J. Harel, C. Koch, and P. Perona. "Graph-based visual saliency." NIPS, 2006.
[7] X. Hou and L. Zhang. "Saliency detection: A spectral residual approach." CVPR, 2007.

input/color*.png	: Color image captured from the Kinect
input/depth*.png	: Raw depth image capture from the Kinect
input_cropped/depth*_crop_smooth.png	: Cropped version of the depth image which has been smoothed to fill in holes
input_cropped/color*_crop.png	: Color image cropped to align with cropped Kinect image
pointclouds/cloud*.pcl	: Point cloud data corresponding to the depth map stored in the Point Cloud Data file format
pointclouds/cloud*.mat	: Point cloud data corresponding to the depth map stored as Matlab matrix channels are X,Y,Z
annotated/*.png	: Binary images with white pixels corresponding to human labeled objects