DATABASE QUALIFIER EXAM
Note: Please attempt 8 (eight) out of 9 (nine) questions. If you have to make assumptions, please state them clearly and proceed. If the question is unclear, just state how you interpreted it so the answer can be properly evaluated.
1. Data Modeling, Mapping, Languages
Imagine that you were going to implement an (extended) Entity-Relationship database management system. Ignore the query language and query optimization. Also, ignore physical sorting, indexing, etc. Instead, concentrate on the implementation of the data structures.
Which data structures would you use for:
Given the data structures above, now consider each of the modification, insert, delete and update, to data in the database. Please describe the semantics of each operation on each data structure. I.e., how would you insert a new entity? How would you insert an entity, which is already in a supertype into a subtype? How would you delete an entity from a supertype? Etc.
2. Relational DB Design
Consider the overall process of relational database design as performed in practice as well as the way it is approached in theory. Answer the following keeping both in mind.
3. Query Processing
Consider the following query:
SELECT DISTINCT Ename
FROM EMPLOYEE
WHERE Title = 'programmer' AND Dept = 'production"
Assume the following:
10% of employees are programmers,
5% of employees are programmers who work for the production
department,
20% of employees work for the production department,
there are 10 departments,
the EMPLOYEE relation has 1,000 pages with 10 tuples per page,
there is a 51 page buffer that can be used to process the query.
Find the best execution plan for the following cases:
(a) The only index is on Title and it is an unclustered 2-level B+
tree.
(b) The only index is a composite index on Dept,Title,Ename and it is
a clustered B+ tree with 2 levels.
(c) The only index is a composite index on Dept,Ename,Title and it is
a clustered B+ tree with 3 levels.
Make sure you discuss the different options and their I/O costs for processing the query in each of the three cases.
Also, be sure to specify any assumptions you make.
4. Physical Storage Structures
Stream Databases is the latest fad in databases.
A stream database may store, e.g. a click stream, transactions from a set of cash registers in a supermarket chain, time-series data from a scientific experiment, transactions in the stock market, or traffic at an ISP host.
Most likely, the stream will be stored, or logged, sequentially by simply appending each new event in the stream to the end of the log. Most likely, the events in the stream will have monotonically increasing time-stamps that represent the times events were logged. Many applications need to access the log of events to find the events that were logged at a particular point of time in the past or during a particular period of time in the past. In order to make the access to the log as efficient as possible for these applications, you are asked to analyze the many different indexing structures you know, including sparse and dense indices, single-level indices, B-trees, and hash structures.
For each one of the index structures, please describe all advantages and disadvantages of using this index structure to index stream databases. Please address at least the following issues for each structure: efficiency of logging, size of index, and efficiency of lookup.
5. Transactions and Concurrency Control
Consider the following schedule of transactions T1 and T2, operating on database objects x, y, z:
Events b and e are the beginning and the end of the schedule S, respectively, and c is the commit. Assume that two phase locking protocol is used. Please answer the following questions, all asking where certain events may occur in this schedule. You may answer each question by giving the two events in the schedule S that must precede and follow the event in question. For example, transaction T1 must lock x between b and w1(x).
6. Database Recovery
To restore a database to a consistent state after a failure has occurred, a log and some combination of UNDO and REDO procedures are typically used. We can assume for this question that an UNDO-REDO recovery method is used where committed transactions are REDOne and uncommitted transactions are UNDOne.
In the case of physical logging, a before-image and after-image of the updated data item is stored in the log, i.e., a physical copy of the data item as it existed in the database before and after update.
In the case of logical logging, a log record would contain the name of an UNDO-REDO function and its parameters. For example, in the case of inserting a new record in a table, we might have the following REDO log record:
<INSERT, tablename, record>.
The associated UNDO record would look like
<DELETE, tablename, record>.
The motivation for logical logging is to reduce the amount of log data written and the associated I/O operations for the log. Even for a simple record insertion into a table, several changes may happen such as: moving records within a page, inserting entries into indexes for the Employee table and possibly firing trigger conditions that might change existing data values.
Answer the following with regard to a logical logging scheme:
(a) Are logical operations idempotent? Explain with an example. Also, if they are not, how can this be overcome.
(b) Can we assume that the logical operations are done atomically? Explain. Also, if they are not atomic, show what problem may be encountered during recovery and explain how we can overcome this problem.
7. Database Architectures
There have been several architectures proposed for databases:
More recently we have seen use of mobile database architectures, web-based databases and peer-to-peer databases.
8. Distributed Databases
A retail company is distributed with offices in New York, Atlanta, Chicago, and Los Angeles. It wants to keep its ORDER data (order#, date, total_amount, customer#, warehouseId, ) and its CUSTOMER data (customer#, customer_info, last_order_date, ) with some optimal stategy for minimizing overall cost of processing inquires, orders, payments, etc. [NOTE: You are required to answer questions 1) and 4). But you only need to answer one of the two questions 2) and 3). ]
9. DB Technology and Applications
Database technology is undergoing a lot of evolution in terms of dealing with new applications enabled by the web and in e-commerce, entertainment and publishing, scientific applications in biology, GIS etc.