CS 4420 DATABASE PROJECT
Spring 2008
ABOUT THE TEAM:
This is a group project: 3 students per team. If the class is not divisible by three, we will have one or two 2-person teams. Graduate students cannot team up with undergrad students. Pick your team by
ABOUT THE IMPLEMENTATION:
You may choose either C++ or Java as implementation language. The project has to be implemented on any flavor of Unix/Linux.
REQUIREMENT SPECIFICATION:
The Undergrad students have to implement only Phase 1 and Phase 2. While the graduate students have to implement all the three phases.
CODE:
B+ Trees for Indexing - Java Version
ABOUT THE PROJECT:
The overall objective of this project is to give you a better understanding about some of the internals of a database management system. To do this you will build a simple single-user database system that will execute a restricted form of SQL. By simple here, we mean that many features of a real relational DBMS will be simplified to make this project manageable as a course project.
The project will be implemented in three phases( two phases for undergrads ). The First phase will involve writing the Storage Manager component of a database system. The Storage Manager is responsible for providing different file structures such as sequential files and indexed files to store data, providing access to the files (insert and fetch, we will not worry about delete) and providing/managing the main memory buffer pool. For our case, the Storage Manager will use the basic services provided by Unix (e.g., lseek, read, write) for carrying out low level data transfer between the disk and your buffer area. In these two phases, we will also include the construction of a system catalog which will include the names of relations (i.e., files), columns in each relation, data type, etc.
The Second phase will involve writing the Query Engine for our database system. The Query Engine consists of several components: a scanner/parser/validater, a query optimizer, an access plan generator and a runtime database processor. The scanner/parser/validater identifies the tokens in the text of the query, checks the query syntax and validates relation and attribute names and produces a query tree. The query tree is passed to the query optimizer which uses some basic rules and optimization strategies to rewrite the query tree so that it can be executed in a more efficient manner. The optimized query tree is passed to the access plan generator that provides the algorithms for executing the specific relational algebra operations in the order specified in the tree. The access plan generator uses statistics contained in the system catalog for determining the specific algorithms and access strategies to use (e.g., file scan or index scan). Once the access plan has been determined, the runtime database processor will execute this query-specific access plan to produce the result for the query.
The Third phase[only for Graduate Students] will involve writing the Transaction Management for our database system. The Transaction Management need not include the recovery feature. The exact details of this phase will be updated soon......
Theses phases will also involve implementing some Innovative Feature in your database system. This Innovative Feature should be some feature which is not specified in the project description but you think it will be useful for database system user. The Feature should not be trivially simple to implement and must be approved by TA.
Some examples of the possible features are -
You have to finalize what feature you want to implement before end of Phase I (code deliverables). Talk to the TA if you have problems in choosing the
feature.
IMPORTANT: Please note that we will change the project requirements anytime during the semester. This is part of project description. The changes will not be too drastic. But the only purpose of these changes is to make sure that you understand the course material and project description.
The project has three milestones
The deliverables for all three milestones should be sent to us at quocminh@cc.gatech.edu and raghavendra.tk@gatech.edu by any one of the team members.
Phase I (code deliverables)
Submit your commented code for Phase I by
Phase II (code deliverables)
Submit your commented code for Phase II by
Phase III (code deliverables) - only for Graduate students
Submit your commented code for Phase III by
A detailed description of the deliverables for each of the milestones will be made available online at the course web site.