College of Computing
Georgia Institute of Technology
Atlanta, Georgia 30332-0280
The Database Research Group at Georgia Tech in the College of Computing is dedicated
to conducting research on all aspects of database related problems in the development
of large scale applications. The group is headed by Prof. Sham Navathe and includes
Professors Ed Omiecinski and Leo Mark. It has a laboratory of its own equipped with
about a dozen Sun SPARC workstations (including a SPARC 10 and a SPARC 20) , several
WINDOWS-NT workstations and some Macintosh machines. The database server machine
has several relational and object-oriented database management systems including
ORACLE, SYBASE, INFORMIX Universal Server (to be received), Objectstore, Versant,
and Ode. Experimental systems and tools are constantly under development by graduate
students. Currently there are about five doctoral students, one post-doctoral research
associate (to arrive) and several masters students.
Following is a list of some of the on-going projects:
Mobile Intermittently Connected Databases: This project (in conjunction
with Synchrologic, Inc. of Atlanta) is investigating issues related to consistency
of data, propagation of updates, transaction processing, conflict resolution etc.
for a client-server architecture in which the clients are mobile and are only intermittently
connected to the server. Multicasting protocols are being evaluated for improving
efficiency so that the architecture may be scaled up to thousands of clients in typical
applications such as sales force automation.
Data Mining Algorithms: In this ongoing project we have developed efficient
algorithms for discovery of association rules that discover interesting relationships
in existing large volumes of raw transaction data. Preliminary work has also been
done on detecting negative associations (or lack of relationship) among certain types
of data. The work is applicable to large transaction volumes occurring in supermarkets,
banks, insurance companies, telephone service, etc. The algorithms are currently
being applied to mining associations and similarities among images.
Visualization and User Interface Construction for Large Document Databases:
We have implemented a prototype system to improve the performance of users who wish
to efficiently search a large document space without using any keywords. The interface
uses techniques for informing the user visually about the relevance of highly ranked
documents vis-à-vis the request. A thesaurus is employed to allow the user select
related and unrelated terms. Positive and negative feedback windows let the user
refine the original request by selecting or rejecting documents or parts of documents
as well as thesaurus words. Extensive user studies were conducted to establish the
usefulness of this approach and have showed that visual interfaces improve the user
performance in document retrieval.
Mitochondrial Genome Database: As our contribution to the human genome
initiative, in this project we are building a genome database containing information
specific to the mitochondrial chromosome which has a ring-like structure with 15,500
base pairs. As a joint effort with the Molecular Genetics department at Emory (and
Prof. Doug Wallace,) we are creating a web site (http://www.gen.emory.edu/mitomap.html)
where queries can be posed related to specific information about genes, genetic
defects related to disease, gene-gene interactions and functional information about
genes. The site will help scientists worldwide in obtaining and contributing human
mitochondrial genome information in one place.
Database with a time dimension: In a database, if the transaction time
of each update is recorded and the new transaction is appended to the database, we
term it as a ětransaction timeî database. This project is examining a variety of
issues for such databases including efficient storage, query processing with incremental
Security Modeling and Query Processing in Heterogeneous
Databases: Many schemes for security enforcement in databases have been
proposed including Discretionary Access Control (in SQL authorization schemes) and
Mandatory Access Control (MAC) in Goverment with levels of security for users. In
this project we have defined a common representation of security models at the conceptual
ER -like model level, and at a formal level, that will help in accessing data across
heterogeneous schemes. Methods for query processing and indexing of these databases
are being investigated.
Distributed Data Intensive Systems: The Distributed Data Intensive Systems Lab (DISL) is a new research lab at Georgia Institute of Technology. Currently, DISL has two faculties and four Ph.D. students.
DISL conducts research in distributed computing and data intensive environment. Distributed data intensive systems, especially Internet-scale systems, raise more complex issues in reliability,
extensibility, concurrency, availabitlity, and efficiency than traditional centralized systems. We are interested in theories and techniques that make software handle these problems easily, such
as specialization and quality of service . Particularly, we focus on issues of improving responsiveness and enhancing scalability and entensability of distributed query and computing service.
Besides the above, some of the recently completed projects include:
HIPED - Heterogeneous Intelligent Processing for Engineering Design - In this
project supported by the DARPA Intelligent Integration of Information (I3) program,
a front end design assistant system called interactive Kritik was integrated with
back end access to a variety of heterogeneous databases .
Parallel DB Reorganization - examined the problem of reorganizing the allocation
of data to disks in a shared nothing parallel database environment; also considered
efficient index reorganization.
Index Construction and utilization for Query Processing documenting databases
- considered the problem of constructing an efficient indexing scheme in a document
database and its utilization in parallel query processing.
Organization and performance improvement of video server databases - considered
different partitioning and merging techniques for combining data from multiple video
streams and evaluated them for relative performance improvement.
Hypermedia Modeling - considered the problem of browsing and query processing
in databases which are equivalent to a network of interconnected hypermedia nodes.
An algebra for the model and algorithms for efficient evaluation of constraints are
developed. The work has application to the modeling of the information on the web.
Automatic Metadata Management - considered the problem of maintaining a metadata
repository of constraints when database instances are updated. Subsequent query processing
would use updated constraints. Detailed data structures are proposed to capture constraints
in a graph and algorithms are proposed to utilize this semantic constraint graph
for query processing.
We are very keen to do collaborative projects with industry and government agencies
where we can apply our research strengths to meaningful real life application problems.
For further information contact :-
phone: (404) 894-8358
fax: (404) 894 9442
Prof. Shamkant B. Navathe
Head, Database Research Group