Computer Vision

I took computer vision this fall (2011) from an ensemble cast: Dellaert, Balch, Bobick, Essa, and Rehg, with guest lectures from Grundmann and Schindler.

Topics we covered in class: fitting lines to data, image convolution and correlation, stereo-photo rectification, image segmentation with normalized cuts, Kalman filtering, Gaussian distributions, SIFT and HOG descriptors, support vector machines, and the maximum margin principle for SVM training, etc

Topics my teammates and I explored in project work: the Kinect camera + Robot Operating System (ROS), interpreting depth information, working with point clouds, "skinning" surface points to a skeleton, using the Point Cloud Library (PCL) to perform surface reconstruction, computing geodesics on a surface, gradient-based image descriptors, approximate nearest-neighbor search with min-hashing, and k-means clustering.

Final Project

In our final project, my teammates and I explored the techniques behind the ShadowDraw system, which provides educational feedback to artists-in-training by visually suggesting shapes and lines retrieved using a sketch-based image retrieval backend. In particular, we evaluated the design decisions made in the original system by varying the parameters of the image database, and we created and evaluated a variant on the image descriptor used in the original system.

We wrote code to compute the BiCE descriptor (vaguely similar to SIFT or HOG descriptors, but uses a different binning scheme and thresholds gradients into one bit per bin) for an image patch and built a database with images from the Berkeley Segmentation Dataset. We implemented a variation on the BiCE descriptor that we call BER, for “Binary Edge and Region” which adds interior/exterior region info bits to the existing edge info bits. We sample from the patch on a regular grid with a set distance between sample pixels. We define interior/exterior based on the manual segmentation info from our dataset. To quickly retrieve a patch from a ShadowDraw dataset, we created an inverted file structure that indexes a database of image descriptors based on their Jaccard similarity (# bits in bitwise AND divided by # bits in bitwise OR). We use min-hashing to perform this approximate nearest-neighbor lookup.

Adding region info to the descriptor increases its discriminative power as seen in the following ROC curve. To take advantage of region information, the artist needs to be able to easily specify the interior/exterior of a shape they are sketching. There are tools (for example Sýkora et al's LazyBrush) that can accomplish this, however we haven't integrated it into our system.

We also evaluated the quality of min-hash retrieval by varying the number of hashes per entry (s) and the number of duplicate tables (h). The plot below shows that using h=3 provides an appropriate tradeoff of precision and recall. The s value has a smaller impact, and can be reduced when performance is a concern without much loss of quality.