Database Fall 2002 Qualifier Exam

Answer any 7 questions. The question on DB security is mandatory for the student working in the security area. If you have to make any assumptions, state them clearly and proceed.

Questions 1 to 3: The following three questions are all related to the ER- and EER-to-Relational Mapping and storage of relational databases. Consider in detail all the four options for mapping a set of subtypes that form a specialization of an entity-type into a corresponding set of relations. (Note: see Pages 295-297 in Elmasri/Navathe Step 8 – there we are generating relations to model the subtypes of an entity type; the four different options generate different sets of relations.)

Question 1 (Mapping and Constraints)

For each one of the four options, please discuss the advantages and disadvantages in terms of how easy it is to enforce the integrity constraints of the specialization in the corresponding relations.  Make sure you consider both disjoint and overlapping subtypes, as well as mandatory and optional participation of entities in the subtype(s).

Question 2 (Normalization)

For each one of the four options, please discuss advantages and disadvantages relative to normalized database design.  Consider whether the resulting relations are in a desirable normal form without any undesirable redundancy. In other words, discuss what potential problems can arise that would violate certain normal forms. Give appropriate examples.

Question 3 (Physical Design)

For each one of the four options, please discuss advantages and disadvantages relative to efficiency aspects of physical database design. You may want to analyze both space consumption and query cost.  Consider queries involving SELECT/PROJECT/JOIN operations. You may want to think about how the concept of inheritance in the EER Model would be mimicked in the Relational Model.
 
Question 4 (Query Processing)

Consider three relations R(A,B), S(B,C), and T(C,D and the following
SQL query:Select * From R, S, T Where R.B = S.B and S.C = T.C and R.A > 5 and T.D > 10

1) Draw a relational algebra execution tree for this query and adding it into the following three commonly used heuristics: Push Selection down, favor joins over cross-products, and Use a left deep tree.

2) Given an example scenario where it is more efficient not to push one of the selection conditions down. Show the “better” (but equivalent) relational algebra tree and list the characteristics of the scenario.

3 Given an example scenario where it is more efficient not to favor a join over a cross-product for this query.

4) Given an example scenario where it is more efficient not to use a left-deep tree for this query: Show the ``better” but equivalent relational algebra and list of the characteristics of the scenario.

Question 5 (Concurrency Control)

a. Define informally what a recoverable schedule is. What is the main property that must hold?

b. If two-phase locking is used, when can write locks be released in a recoverable schedule? When is the earliest time that a read lock can be released? Please explain.

c. Let the memory-commit of a transaction T be the point when T ’s commit record is written to the in-memory log buffer. Let the disk-commit of T be the point when T ’s log record is flushed to disk. With so-called group commit, T ’s write locks are released after the in-memory commit. The client that submitted T is not notified of its commit until after the disk commits. Explain why a group commit system does not generate recoverable schedules. Explain why this is not a problem.
 
Question 6 (Distributed Databases )

Consider the following table in our distributed database:

FLIGHT_TOTALS(FlightNumber, Date, PassengerCount)

The Flight_Totals table keeps track of the total number of passengers that have reservations on a particular flight (designated by FlightNumber, which is the key).  So, when a customer makes a reservation, the Flight_Totals table must be updated.

The Flight_Totals table is partitioned into two fragments by FlightNumber, whose values range from 1 to 1000.  The fragment Flight_Totals_1 contains flight information where 1 <= FlightNumber <= 500 and the fragment Flight_Totals_2 contains flight information with  501 <= FlightNumber <= 1000.  Both fragments are of the same size.

The following three applications access the Flight_Totals table:

App1:  SELECT *
FROM Flight_Totals;

App2:  SELECT *
FROM Flight_Totals
WHERE Flight_Number >= 1 AND FlightNumber <= 500;

Then the application does some processing and determines how the data is to be updated/deleted/inserted and returns the updated data (we can assume that the amount of data stays constant)

App3:  SELECT *
FROM Flight_Totals
WHERE Flight_Number >= 501 AND FlightNumber <= 1000;

Then the application does some processing and determines how the data is to be updated/deleted/inserted and returns the updated data (we can assume that the amount of data stays constant)

There are three sites in our network: Site 1, Site 2 and Site 3.  Suppose we are following a Data Shipping approach where data is sent to a given site for processing.  Application App1 is executed at Site 1, App2 at Site 2 and App3 at Site 3.  Application App1 is executed the same number of times as App2, while App2 is executed three times as often as App3.
 
The cost of reading a non-local fragment is 1 and the cost of writing a non-local fragment is an additional 1 unit (i.e., read + write).  The cost of reading or writing a local fragment is 0 units.  The cost of storing a fragment at a site is not a factor.

How should we allocate copies of the fragments to sites so that the processing costs are minimal?  Justify your answer.

Question 7: ( Database Recovery )

Shadow paging is a database recovery technique that can be classified as a NO-UNDO/NO-REDO scheme.  Shadow paging was originally proposed in the late
1970's for recovery in a single-user database environment.  For this question, you will need to propose an extension to the original shadow paging scheme that can be used in a multi-user database environment.

Some of the historical disadvantages associated with applying shadow paging
to a multi-user environment include the high cost of reading the page table, the high cost of writing the shadow page table, and  the loss of locality (clustering) of data pages. Your multi-user shadow paging scheme should address these disadvantages.  So, you should explain/justify why these are no longer disadvantages for your shadow paging approach.

Since recovery is tied to concurrency control, you can assume that the database system employs two-phase locking at the page level.

As a hint for this question, you might consider how technology has advanced
since the 1970's.

Question 8 (Technology and Applications)

In the last decade, the database technology has started accommodating products and functionalities that are geared to support certain applications. Consider Data Warehousing Systems, OLAP tools (that work with multidimensional data) , Data Mining Tools, and tools for searching information on the web. For any  three out of these categories, answer the following:

- what are the typical products if any, that you can think of under this category
- what is the functionality peculiar to this family of products
- what are the database challenges in providing  these functionalities
- what is the current state of research with respect to the challenging problems for this category
 
Question 9 (Database Security)

A. Consider the concept of multi-level security models for databases.
- explain what this concept means and implies in terms of its implementation
- describe in any two MAC (mandatory access control models) support it : in particular, talk about how they store data and how they process queries
- How is polyinstantiation related to this? What does it do and what goals does it achieve?

B. List any two outstanding important problems faced in making databases 100% secure. Discuss what measures have been proposed in research and practice to deal with the problems. What further work is necessary to work toward the ultimate goal?