|
|
||
|
My research interests are in the areas of Computer Vision, Machine Learning and Knowledge Based AI. In particular, I am interested in activity recognition from videos and more broadly in the area of video understanding. |
I am interested in the exploitation of context to become better at activity recognition and video understanding. We humans make use of tremendous amount of knowledge in order to make sense of what we perceive. So, along with context understanding, I am also very interested in knowledge representation techniques that can help us build better activity recognition systems that can go beyond detecting simple short-term activities and generate long-term narratives that describe the scene. |
|
This project aims at recognizing anomalous activities from aerial videos. My work is a part of the Persistent Stare Exploitation and Analysis System (PerSEAS) research program which aims to develop software systems that can automatically and interactively discover actionable intelligence from airborne, wide area motion imagery (WAMI) in complex urban environments. |
|
This project aims to simplify the process of plant species identification using visual recognition software on mobile devices such as the iPhone. This work is part of an ongoing collaboration with researchers at Columbia University, University of Maryland and the Smithsonian's National Museum of Natural History. My major contribution to this project was the server's database integration and management. I also worked on stress-testing the backend server to improve its performance and scalability.The free iPhone app can be downloaded from the app-store. Here is the project webpage and here is a video explaining the app's usage. |
|
The project involves face verification in uncontrolled settings with non-cooperative subjects. The method is based on attribute (binary) classifiers that are trained to recognize the degrees of various visual attributes like gender, race, age, etc. Here is the project page.I was a part of this research at Columbia University from December 2009 to May 2010. I mainly worked on Boosting to improve the classifiers' performance. |
|
The face representation is based on a Gabor wavelet transform. The features are extracted using a carefully chosen symmetrical Gabor wavelet matrix and a Multi Layer Perceptron is used for classification. The designed system is insensitive to small changes in head poise and homogenous or step illumination changes and is robust against facial hair and glasses for small datasets.This was my undergraduate thesis supervised by Dr. C. N. S. Ganesh Murthy, Principal Scientist at Mercedes-Benz Research and Development, Bangalore, India. Here is the project report
|
|
Special Problems (CS 8903)Prep - Doctoral Qualifiers (CS 7999) |
Prof. Irfan EssaN/A |
Knowledge-Based AI (CS 7637)Numerical Linear Algebra (MATH 6643)Special Problems (CS 8903) |
Prof. Ashok GoelProf. Silas AlbenProf. Irfan Essa |
Special Problems (CS 8903) |
Prof. Irfan Essa |
Machine Learning (CS 7641)Special Problems (CS 8903) |
Prof. Charles IsbellProf. Irfan Essa |
Computer Vision (CS 7495)Grad Studies (CS 7001)Special Problems (CS 8903) |
Prof. Jim RehgProf. Gregory Abowd and Prof. Nick FeamsterProf. Irfan Essa |
Operating Systems (COMS W4118)Projects in Computer Science (COMS E6901)Research Assistantship (COMS E9910) |
Prof. Junfeng YangProf. Peter BelhumeurProf. Peter Belhumeur |
Analysis of Algorithms (COMS W4231)Biometrics (COMS W4737)Projects in Computer Science (COMS E6901) |
Prof. Clifford SteinProf. Peter BelhumeurProf. Peter Belhumeur |
Programming Languages and Translators (COMS W4115)Computational Aspects of Robotics (COMS W4733)Visual Interfaces to Computers (COMS W4735)Machine Learning (COMS 4771) |
Prof. Alfred AhoProf. Peter AllenProf. John KenderProf. Tony Jebara |
|
The goal of this project was to develop a system that automatically geo-tags an image by comparing it with a large collection of geo-tagged images (Google Street View images, in our case). SIFT descriptors are computed for the images and the matching is done using a KD-Tree. This project is an implementation based on the work of Schindler et al. (CVPR 2007) and Zamir et al. (ECCV 2010). This project was done as a part of the 'Computer Vision' course at Georgia Tech (instructor: Prof. Jim M. Rehg).
Here is the project presentation
|
|
The goal of this project is to learn about the close relationship between learning and problem solving. In this project, we explore this relationship by considering several problems from the Raven's test of intelligence (Raven's matrices). We develop techniques to solve the Raven's matrices using both propositional and visual reasoning. This project was done as a part of the 'Knowledge Based AI' course at Georgia Tech (instructor: Prof. Ashok K. Goel).
Here are the project reports: Solving the Raven's matrices using Propositional Reasoning
|
|
The SN*W Programming Language is a special purpose declarative language designed for Genetic Programming by allowing programmers to easily harness the power of Genetic Algorithms (GA). A SN*W program is a simple description of an organism structure along with simple methods for construction, mutation, selection and recombination. The SN*W compiler translates these events into a full environmental simulation. The language was developed by five of us as a part of the Programming Languages and Translators course at Columbia under the guidance of Prof. Alfred V. Aho.
Here is the complete SN*W Report (includes the Reference Manual and Tutorial)
|
|
The goal of this project was to develop a face recognition system that could recognize people based on side-profile images. The system was designed to be invariant to head-tilt and pose. An iPhone application was developed to showcase the real-time capabilities of the system. The user takes the profile picture of a person using his/her iPhone and uploads it to the server (the server is a Ruby on Rails application). The server does the recognition and sends the results back which gets displayed on the iPhone UI. The entire request-process-response loops takes no longer than 3.5 seconds (on average). This project was done as a part of the Biometrics course at Columbia (instructor: Prof. Peter N. Belhumeur).
Here is the project report
|
|
The goal of this project was to take a sequence of visual images, and to determine from them if the user has placed some body part(s) in a predetermined sequence of locations and/or poses. If the sequence of images matches the predetermined sequence, the user's access gets 'APPROVED', else the access gets 'DENIED'. An arbitrary predetermined sequence of hand gestures was used where the user displays a combination of numbers using his fingers followed by a specific hand rotation and closure of the fist. The 'Visual Lock' gets unlocked only if the hand gestures are the same as the predetermined sequence. The recognition sequence can be changed to handle any (controlled) hand gestures. This project was done as a part of the Visual Interfaces to Computer course at Columbia (instructor: Prof. John R. Kender).
Here is the project report
|
|
The goal of this project was to develop a 'Columbia Map Assistant' that would describe the location of a visitor to the Columbia campus and give the visitor directions from one building to another. The first main job was to use the given map to encode the buildings' shapes, to determine their spatial relationships to each other and to filter out any relationships that are unnecessary because they can be easily inferred. The second main job was to use these descriptions to generate a natural language description that unambiguously indicates how to reach the goal from the source. This project was done as a part of the Visual Interfaces to Computer course at Columbia (instructor: Prof. John R. Kender).
Here is the project report
|
|
The goal of this project was to write and analyze algorithms that explore different ways of deciding the degree of similarities amongst actual images. A set of images of fruits and vegetables along with a few random objects (distracters) were used. The algorithm performs a color-based match and a texture based match and then uses the total match to decide the similarity amongst the images. This kind of an algorithm is useful in retrieving images based on the visual content rather than the associated labels or other metadata. This project was done as a part of the Visual Interfaces to Computer course at Columbia (instructor: Prof. John R. Kender).
Here is the project report
|
Email: |
vinay [at] gatech.edu |
Address: |
304C, College of Computing Building
|
Also On: |
|