The overall objective of this project is to give you a better understanding about some of the internals of a database management system. To do this you will build a simple single-user database system that will execute a restricted form of SQL. By simple here, we mean that many features of a real relational DBMS will be simplified to make this project manageable as a course project.
The project will be implemented in two phases. The first phase will involve writing the Storage Manager component of a database system. The Storage Manager is responsible for providing different file structures such as sequential files and indexed files to store data, providing access to the files (insert and fetch, we will not worry about delete) and providing/managing the main memory buffer pool. For our case, the Storage Manager will use the basic services provided by Unix (e.g., lseek, read, write) for carrying out low level data transfer between disk and the buffer area. In phase I, we will also include the construction of a system catalog which will include the names of relations (i.e., files), columns in each relation, data type, etc. The second phase will involve writing the Query Engine for our database system. The Query Engine consists of several components: a scanner/parser/validater, a query optimizer, an access plan generator and a runtime database processor. The scanner/parser/validater identifies the tokens in the text of the query, checks the query syntax and validates relation and attribute names and produces a query tree. The query tree is passed to the query optimizer which uses some basic rules and optimization strategies to rewrite the query tree so that it can be executed in a more efficient manner. The optimized query tree is passed to the access plan generator that provides the algorithms for executing the specific relational algebra operations in the order specified in the tree. The access plan generator uses statistics contained in the system catalog for determining the specific algorithms and access strategies to use (e.g., file scan or index scan). Once the access plan has been determined, the runtime database processor will execute this query-specific access plan to produce the result for the query.
The project has two milestones:
PHASE I:
The Storage Manager will be responsible for handling
The file structure for the relations will be a sorted sequential file. The records will be given as input in the necessary order so no sorting will be required. In addition, a file will be ordered on the first attribute (i.e., column) value in ascending order.
The system catalog will reside on disk as will the files storing the relations. The catalog will contain the following information:
Note: A relation will typically have many columns and the columns will be listed in order within a tuple. In this project we suggest that you consider at most two indexes on individual columns for each relation. You will find additional catalog information by carefully reading the requirements of PHASE II.
The file structure for a relation will be an ordered (sorted) sequential
file.
We will access a particular block (i.e., a page) by its offset within the
file.
The file will be initially sorted by the first column, so a clustered
index
may be created for this column. We may also create one secondary index on
some other column.
Both B tree and B+ tree code will be provided. Examining the code will
give you some insight into
how to create/insert into the data file.
After you create the data file for the relation you can
extract the necessary information that will be inserted
into the index file.
The basic operations on the data file will be
The basic operations on the index file will be
Another component associated with the Storage Manager is the Buffer Manager. The Buffer Manager is responsible for maintaining a main memory pool of blocks (either data, index or catalog). You may view the buffer pool as a cache. However, in this project, the Buffer Manager will only be concerned with data blocks. When a data block is needed the Buffer Manager is called. It will check if the main memory buffer pool contains the desired block. If so the main memory address is returned. If the desired block is not in the buffer pool, a physical I/O (i.e., a READ) must be issued to retrieve the block from disk. The retrieved block will be stored in a free location in the buffer pool and its main memory address will be returned. If there is no free block, the Buffer Manager will use a Least Recently Used buffer replacement algorithm to find a block in the buffer that will be replaced by the new incoming block. Usually, the Buffer Manager maintains an internal hash table or index table for fast lookup in the buffer pool. This table associates a buffer location with the disk blocks (identified by file ID and relative block number within the file) currently in the buffer. The Buffer Manager will also maintain some statistics about the number of logical block accesses (the number of requests for the data block) and the number of physical block accesses for data (the number of disk fetches performed). Every time the Buffer Manager gets a request for a block, it increments the logical block access counter and if the requested block is not in the buffer, then it will also increment the physical block access counter.
The Storage Manager will accommodate two basic access paths: Table Scan and Index Scan. For a Table Scan, the Storage Manager will be called to open the associated file. Then the Storage Manager will be called repeatedly to return each block of the file in physical order. For an Index Scan, the Storage Manager will be called to open the associated index file. Then the Storage Manager will be called to find the address associated with a given key value. The block address will be returned. The block may be an address of a data block or an address of a block containing pointers to data blocks that contain records with the same key value. Finally the Storage Manager is called to open the associated data file (if not already open) and to read the necessary data block. The value returned by the Storage Manager is the main memory address of a block in the buffer pool. As previously mentioned, the requested block may be found in the buffer, thus eliminating a disk access.
The block size we will use is 512 bytes. The record size for the relations will be determined by the number of columns (4 bytes per column). There will be no more than 10 columns in any table. Records should not be split between blocks. We will have a buffer pool of 25 blocks.
For PHASE I,
your system should accept as input the following database language
commands:
| CREATE TABLE
|
| CREATE INDEX |
| LOAD TABLE |
| LOAD INDEX |
| LOAD RSTATS |
| LOAD ISTATS |
| PRINT TABLE |
| PRINT INDEX |
| PRINT CATALOG; |
| PRINT BUFFERSTATS; |
| RESET BUFFERSTATS; |
Note: The commands are terminated by a semicolon and command keywords are case-sensitive (i.e., capital letters). Square brackets in the syntax specifications indicate command components that are optional. For the CREATE TABLE command, relation and attribute names are limited to 4 characters each and must begin with a letter. Every relation name must be unique and within a given relation, every attribute name must be unique. A relation will have a maximum of 10 attributes. To simplify things, all attributes will be restricted to 4 bytes in length and two data types Integer or String. All the rows will be inserted into a table at one time via the LOAD TABLE utility. For the CREATE INDEX command, the associated catalog information for the index is created. At most two indexes may be created for a single relation on different attributes. The default is that duplicate values may occur for the indexed attribute. If duplicates cannot occur, then NO DUPLICATES should be specified in the CREATE INDEX statement. The LOAD INDEX utility will create the index file on disk. It will be responsible for reading the relation and storing the key value/address pairs in the index file. The LOAD RSTATS utility is responsible for putting the statistical information about the tables and columns into the catalog. The LOAD ISTATS utility is responsible for putting the statistical information about the indexes into the catalog. The three PRINT commands are also utility functions but are only needed to help verify that your PHASE I is working. The PRINT TABLE command will print the contents of a relation, one row per line. The PRINT INDEX command will print the key values in the leaf nodes of the index along with their data block addresses. The PRINT CATALOG command will print the contents of the system catalog. The PRINT BUFFERSTATS command will print the contents of the two block access counters. The RESET BUFFERSTATS command will reset the contents of the two block access counters to 0.
B+ Tree Information:
The B+ Tree index file will reside on disk. Each B+ Tree node uses one disk block. The first block of the B+ Tree index file stores a header block. The header stores the block number (disk address) of the root node in the first 4 bytes and the block number of the next available block. The remainder of the first block in the B+ tree index file is unused.
Blocks are allocated for the index starting with the second block in the file. When a new block is needed, the file will be extended by one block. All key values and pointers are 4 byte integers. The remaining 4 bytes in a node is used to hold a flag, which indicates whether the node is a leaf node or a non-leaf node, and a count of the number of key values stored in the node. For leaf nodes, the pointer to the next leaf node in the sequential chain is stored in Ptrs[0]. A value of -1 indicates the end of the chain. The pointer for Keys[k] is in Ptrs[k+1]. If duplicates are allowed, then the a pointer in a leaf node will point to a bucket containing pointers to blocks in the data file.
IMPORTANT: As a milestone of the project, we require that each project team produce a Phase I report documenting (1) the set of relations and indexes created; (2) the set of functions provided by your Storage Manager; and (3) the list of simplifications made.
Due date: Tuesday of the 8th Week, which is February 25 2003.
PHASE II:
The Query Engine will consist of 4 basic components:
Inherent to this step is the ability to estimate the size of the relations that result from various operations. We will use the notion of filter factors (also referred to as selectivity factor) to accomplish this. The filter factor is defined for different predicate types and can be combined to determine the filter factor of predicates combined by OR and AND. Although our queries will only have predicates combined by AND. The filter factor of a predicate P, denoted by FF(P), is defined as the fraction of rows from a table resulting from the predicate restrictions. The filter factor of a predicate is estimated by making a number of statistical assumptions. Typical assumptions used in most of RDBMSs are the following two: individual column values are uniformly distributed and values from any two columns are independently distributed.
Necessary statistical information includes the following:
| TABLES | |
| Statistic Name | Description |
| CARD | # rows in table |
| NPAGES | # pages holding rows |
| COLUMNS | |
| Statistic Name | Description |
| COLCARD | # distinct values |
| HIGHKEY | highest value |
| LOWKEY | lowest value |
| INDEXES | |
| Statistic Name | Description |
| NLEAF | # leaf pages |
| NBUCKET | # buckets |
| KEYCARD | # distinct values in key |
The basic filter factor formulas are as follows:
| Predicate Type | Filter Factor (FF) Formula |
| C1 |
|
| C1 |
|
| C1 |
|
| Pred1 AND Pred2 |
|
Formulas for estimating the join size of
are shown
below.
If the join attribute,
, is a key for relation S, then each tuple of R
will join with
at most one tuple of S, so we can estimate the join size as
tuples.
For other situations we will use the following:
To determine the most efficient access plan, the statistics and filter
factors will be
used to estimate the I/O cost of the query.
In general, to determine the number of I/Os needed, the filter factor is
multiplied by
the number of rows or pages in the table.
To estimate the number of I/Os needed for a table scan, it is simply the
NPAGES
statistic.
To estimate the number of I/Os for an index scan, it is dependent on
whether the
index is clustered (i.e., the data is clustered on the indexed attribute).
We will ignore the cost of accessing internal nodes in the index.
If the index is clustered then we simply have
.
If the index is not clustered, then we have
.
We can estimate
as
, where
is the maximum number of pointers per node and the factor of
is
included since we assume that the leaf nodes will be only half full.
If duplicates are allowed then we also have to include the access cost to
the
buckets (i.e., pages containing pointers to records with the same key
value).
So, we need to add
.
We can estimate
as
.
Note: At most one index will be used in processing a SELECT operation.
To estimate the size of the result of a unary operation, we can use
as the number of resulting tuples. We divide this by the
blocking factor (i.e., number of tuples per block) to determine the size in
blocks.
Design of the first component
The query language command looks like the following:
| SELECT
|
| FROM
|
Note: Each
in the SELECT clause is specified as either
or
. The SELECT clause must have 1 or more attributes listed.
Each
in the FROM clause must name a relation in the database
and a relation may not appear more than once in the FROM clause. Also,
at least 1 relation and at most 3 relations can appear in the FROM clause.
For each
in the SELECT clause, if
is specified, then it must be one of the
relations
in the FROM clause. IF
is not specified, then there must be
exactly one
relation in the FROM clause with an attribute named
. The same
rule applies
for the WHERE clause.
Each
in the WHERE clause is specified as either
or
.
Each
in the WHERE clause is specified as
or
or it is a constant value (an integer).
Each
in the WHERE clause is one of the comparison operators,
if
is a constant, otherwise
can only be
(i.e., representing an equijoin).
The WHERE clause is optional, but if it is present it must have at least
one comparison term (i.e.,
) and at most 5
comparison terms, separated by AND.
Using our database language, we could define the following tables and
indexes:
| CREATE TABLE r1 (A, B, C); |
| CREATE TABLE r2 (C, F, G, H); |
| CREATE TABLE r3 (A, D, E); |
| CREATE INDEX r1AX ON r1(A) NO DUPLICATES; |
| CREATE INDEX r2CX ON r2(C) NO DUPLICATES; |
| CREATE INDEX r2FX ON r2(F); |
| CREATE INDEX r3AX ON r3(A) NO DUPLICATES; |
and associated queries
| SELECT |
| SELECT |
| SELECT |
| SELECT |
| AND |
We will assume that the ordering of relations in the FROM clause infers a particular join ordering for those relations.
Consider the relations: R1(A,E,X), R2(B,F,G,W) and R3(C,H,Y,Z), and the query:
| SELECT A,B,C | |
| FROM R1,R2,R3 | |
| WHERE R1.E = R2.F AND R2.G = R3.H; |
The FROM clause has the particular order R1,R2,R3 which we assume to mean R1 can join with R2 and R2 can join with R3.
The query tree produced by SCAN/PARSE/VALIDATE is shown in Figure 1.
Design of the second component
The second part of PHASE II, i.e., QUERY OPTIMIZER, will optimize the tree
by doing the following in order:
Because of the restriction we placed on the FROM clause, there are only 2
different
JOIN orderings to consider, which are
and
.
After doing the first three steps, the tree appears as shown in Figure 2
(assuming that the estimated size of (
) is less than the
estimated
size of (
) and that the size of R1 is less than the size of
R2.
We will assume the the left child of a JOIN node is the Outer relation for
the join operation, which should be the smaller of the two.
We should note that choosing the order of JOINs and the outer relation for a JOIN involve using the statistics from the catalog and the formulas for estimating sizes of results of relational operations.
After doing the fourth step, we have the final optimized tree as shown in Figure 3.
When we JOIN two tables, we will assume that there is only one column from each relation which is used in the equi-join operation.
Design of the third component
In part 3, QUERY CODE GENERATOR, we will only consider one join algorithm,
i.e., the NESTED-BLOCK algorithm, where we fill the buffer with a number of
blocks from the Outer relation and one block from the inner relation.
Depending on how much of the outer relation we fit in the buffer will
determine
how many times we will have to read the entire inner relation.
NOTE: In this project, we will not consider using an index for the
inner table access in the
join algorithm.
For the optimized query tree shown in Figure 4,
The Query Code Generator produces the
Access Plan Table as shown in Table 1.
To get a better idea as to how the access plan was determined,
let's examine step (3) in detail.
For this example, let's assume the following:
relation r2 has 1000 rows, each row requires 16 bytes of storage, the block
size
is 512 bytes. So, 32 rows will fit in a block, giving a total of 32 blocks
for r2.
The secondary index on attribute F allows for
duplicate values, so buckets containing pointers to data blocks will be
used in the index.
A bucket
contains only pointers (4 bytes) to data blocks and one pointer to an
additional
bucket. So, with a block size of 512, there will be 127 pointers in a
bucket.
Since we assume a uniform data distribution in our table and
since attribute F has 50 values, each value will occur in 20 rows
(i.e.,
).
Hence, one bucket can store
all the pointers to data blocks for a given value of F.
So, using the secondary index for the selection operator (
)
will cost 1 I/O for the bucket access plus 20 I/Os
for the data block accesses, giving a total of 23 I/Os.
The alternative access method is a table scan for relation
, which
would cost
32 I/Os for the data block accesses. So, using the index is more
efficient.
Design of the fourth component
The Runtime Database Processor executes the functions for the statements in the access plan and prints the final result (i.e., table t6).