I like to think of my research as being divided into phases, each marked by its own unique experiences, opportunities and development of new interests. I am currently in the third phase of my research work at Georgia Tech. The past two phases were spent at IBM India Research Lab and Indian Institute of Technology, Madras.
Georgia Institute of Technology (2005- )
I started the PhD program at the College of Computing at Georgia Institute of Technology in Fall 2005. I work with Prof. Ling Liu in the following areas,
Location-Based Services
Location Privacy
Data Management
Database Systems
I am here to learn and hopefully the coursework at Georgia Tech too would help guide my research. Exploring unfamiliar areas (to a certain extent) is also on the list of my agendas! I am currently working on the following projects.
SLIM: A Framework for Multi-Source Streaming Location-based Information Monitoring
Spatial Alarms
PrivacyGrid: Location Privacy Framework
Decentralized Sphere : A Source Specific Approach for Crawling, Indexing and Searching the World Wide Web
IBM India Research Lab (2003-2005)
I spent two years working at IBM India Research Lab at New Delhi, India. Located in the heart of the capital of India, it is one of IBM's eight research labs in the world. It was here that I worked with some of the most innovative minds in the country on cutting edge problems. India Research Lab finally helped me decide what I want to do with my life and the academic environment at the workplace was an important factor in compelling me to attend Graduate School!
I worked on two major projects at IBM IRL.
BioPatentMiner (with Dr. Sougata Mukherjea)
Before undertaking new biomedical research, identifying concepts that have already been patented is essential. A traditional keyword-based search on patent databases may not be sufficient to retrieve all the relevant information, especially for the biomedical domain. BioPatentMiner is a system that facilitates information retrieval and knowledge discovery from biomedical patents. The system first identifies biological terms and relations from the patents and then integrates the information from the patents with knowledge from biomedical ontologies to create a Semantic Web. Besides keyword search and queries linking the properties specified by one or more RDF triples, the system can discover semantic associations between the Web resources. The system also determines the importance of the resources to rank the results of a search and prevent information overload while determining the semantic associations.
SCORE (Symbiotic Context Oriented Information REtrieval) (with Dr. Prasan Roy and Dr. Mukesh Mohania)
Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. In existing information integration solutions, the application needs to formulate the SQL logic to retrieve the needed structured data on one hand, and identify a set of keywords to retrieve the related unstructured data on the other. SCORE proposes a novel approach wherein the application specifies its information needs using only a SQL query on the structured data, and this query is automatically “translated” into a set of keywords that can be used to retrieve relevant unstructured data.
NOTE: SCORE was rechristened as BUSTER due to naming conflicts with another project at IBM.
Indian Institute of Technology, Madras (1999-2003)
I got my Bachelors degree at Indian Institute of Technolgy Madras. It was here that I discovered that I love analyzing and experimenting much more than attending classes! IIT Madras provided a lot of opportunities to do innovative projects and always supported my desire to do projects even at other institutes. Also I could never underestimate the importance of what I learnt in the classes here, else there would be no point attending Graduate School!
I list here a few of the projects that I did at IIT Madras and other institutes during my undergraduate studies.
Bachelors Dissertation Project (Data Compression)
Complete Implementation of an 'Image Compression System using Embedded Zero-Tree Coding’, under the guidance of Prof. R. Aravind at Indian Institute of Technology Madras.
Image compression is essential for storage and transmission of large amounts of data. Compression done on images employs removing the statistical redundancies and exploiting the limited sensitivity of the human visual system. A compression system involves transformation of the image sample values, followed by quantization and coding of the quantized values. This project employed the wavelet transform technique. The full frame nature of the transform decorrelates the image across a larger scale and eliminates any blocking artifacts at high compression ratios. Scalar quantization operation maps the sample values to quantization indices, which can then be encoded as part of the compressed bitstream. The Embedded Zero Tree encoder followed by arithmetic encoding permits progressive encoding to compress an image into a bitstream with increasing accuracy. This means that as the image is decoded and more bits are added to the decoded stream, the image will contain more detail. The implementation allows rate control and region of interest coding which are two of its defining features.
Summer Internship (Data Mining)
Worked on the project titled ‘Survey and Analysis of Two Way Clustering Algorithms’ under the guidance of Mr. Vivek Jain at IBM INDIA RESEARCH LAB.
With an enormous amount of data stored in databases and data warehouses, it is increasingly important to develop powerful tools for analysis of such data and mining interesting knowledge from it. Data mining is a process of inferring knowledge from such huge data. Clustering or Classification is a major component of data mining. By simple definition, in classification / clustering we analyze a set of data and generate a set of grouping rules which can be used to classify future data. In this project we discuss four different algorithms for clustering analyzing their advantages and disadvantages. Experimentation was performed on synthetic datasets as well as on real data. We further discuss two new algorithms conducting experimentation and finding the results for these new algorithms and comparing them with the results obtained for the already existing algorithms. Matrix clustering is a data mining method, which extracts a dense sub-matrix from a large sparse binary matrix. It can be applied to a web access log by representing the relationship between users and web pages in a binary matrix. The result of matrix clustering is a set of users and a set of pages related to each other. Matrix clustering can be applied for Predicting user behavior, Web Personalization, Mass Customization, Image Segmentation, Co-clustering documents and words, Clustering genes and samples. We implemented six algorithms Ping pong algorithm, Simultaneous Clustering and Attribute Discrimination, FCCM, Bipartite Spectral Graph Partitioning and two new algorithms.