Home          Experience          Research          Publications          Resume   


Computer Vision

3D Object Localization using Finger Pointing Gestures

Advisor - Prof. Kristin Dana (Rutgers),

Feb 2009 to April 2009

The goal of the project was to obtain the 3D position of an object from the 2D image of a human pointing at the object from a certain distance. We performed Image Segmentation using Cascaded Classifiers and Skin Color tone to obtain the position of the Eye and the Pointing finger and thereby obtain the Line of Sight vector. We search along this vector for a significant Object using a Pixel Matching algorithm. Using a calibrated stereo camera setup, we estimate pixel disparity and obtain the depth of the Object. A Lego Robot is then given the task of planning a path to the object given its 3D position.

You can find the report here.

Object Recognition using Corner Descriptors

Advisor - Prof. Lawrence Rabiner (Rutgers),

Oct 2008 to Dec 2008

Consider the task of recognizing an object given a complex 2D scene. We first reduce the image to a set of important corner points and connectivity matrix. The corners were chosen in such a way so that a square has four corner points, a circle has equally spaced corner points along its circumference. These corners along with the connectivity matrix are then given attributes based on their distance and angle with respect to neighbouring corners.

These attributes are then searched and matched across a database that contains the object attributes alone without any background. We find that the corner attributes are not completely accurate but nevertheless produce reasonable results. The main advantage is the reduced number of corner points used for matching when compared to SIFT algorithm.

H.264 Video Compression for Mobile Video Applications

Advisor - Mr. Ajit Gupte (Texas Instruments),

May 2008 to July 2008

We explore reference frame compression in video encoders using 2D orthogonal transform based compression techniques. In a typical video encoder loop, there is a reference frame compression block which compresses the reconstructed frames before sending them to the DDR SDRAM. The reconstructed data is split into compressed reference data and the error data. The compressed reference data is repeatedly fetched from the DDR during the motion estimation process and the error data is required only during the motion compensation process. Due to this a net saving in DDR bandwidth is achieved. Also we know that the motion estimation stage uses lossy data. To keep this loss at a minimum, we developed an efficient compression technique using Hadamard Transforms. This ensures that the energy content of the error data plane is small.

Implementation of FPGA based Object Tracking Algorithm

Advisor - Prof. N. Venkateswaran (SVCE, India),

Jan 2008 to Apr 2008

Undergraduate Dissertation

In this project we use image processing algorithms for the purpose of Object Recognition and Tracking and implement the same using an FPGA. We take advantage of the parallelism, low cost, and low power consumption offered by FPGAs (Spartan 3E). The individual frames acquired from the target video are fed into the FPGA offline. These are then subject to segmentation, thresholding and filtering stages. Following this the object is tracked by comparing the background frame and the processed updated frame containing the new location of the target. The results of the FPGA implementation in tracking a moving object were found to be positive and suitable for object tracking.

Download the report here.

Higher-Order Gabor Spectra, A Mathematical Model for Signal Processing

Advisor - Prof. Nagarajan Venkateswaran (WARFT),

Aug 2006 to May 2008

This work is part of the 2 year Research Training Program in Signal Processing at the Waran Research Foundation, (WARFT) in India. You can find the wiki link here. This proposal describes a novel approach for computationally efficient image and speech signal feature extraction using the Higher Order Statistics of the Gabor Transform - Gabor Polyspectra. The computation complexity of conventional Gabor transforms using FFT is O(NlogN). To further reduce this, the Gabor coefficients are obtained through the Arithmetic Fourier Transform (AFT), which has a complexity of O(N) real multiplications. The Higher Order Statistics obtained from available signal information are transformed to a multidimensional space using the proposed Gabor Transform and the feature vector consisting of a set of dominant harmonics and associated Gabor phase components is extracted.

The primary purpose of the framework is formulate a database and the associated Neural System required to model Vision Networks. The proposed system takes advantage of the fact that the operations of Human Vision Network closely resemble that of the Gabor elementary functions.

You can find the detailed report here.