DATABASE QUALIFIER – FALL 2003

Please answer 6 of 8 questions


DATA MODELS

(a) Give one or two concrete examples /scenarios where relational model will be inadequate to model the data. Point out clearly why the relational model is inadequate and suggest what other models you will consider and why.

(b) Construct an example with at least four relations using foreign key/primary key relationships (not used in any textbook) and draw a corresponding conceptual design using ER or enhanced ER (EER) model. Argue in what sense the conceptual model is equivalent and how it improves the understanding about the data involved.

(c) What do the OODBMSs have to offer in terms of “advanced modeling” features? What is the relationship between the OOPLs (object oriented programming languages) and OODBMSs ? In your opinion why have O-O data models not been very successful?

DATABASE TECHNOLOGIES AND APPLICATIONS

A. Consider two challenges for DBMSs: (a) large number of data types including unstructured data and (b) an ability to deal with multiple database platforms and models to bring about integration of databases.
 
- Discuss the general approach and strategy taken by database systems/products vendors (DBMS companies like Oracle, IBM and Microsoft ) to meet above challenges by pointing out the features/functions they already offer or will offer in the near future  and the difficulties involved.

- Are there technology solutions offered outside the purview of individual DBMSs to “solve” the database integration problem as stated in (b) above? If so, point out what type of tools or technology solutions exist, and to what extent and how they bring about the integration. Give an example of the integration scenario.

B. Write a paragraph showing your understanding of one of the following product categories (comment on the typical functionality of the products, relationship to database technology, the major difficulties and ongoing research issues): (a) search engines (b) data mining/OLAP –(online analytical processing) tools.

 RELATIONAL DATABASE DESIGN

Enhanced Entity-Relationship (EER) Diagrams represent a number of dependencies that are functional. For example, the dependency from an entity type to one of its single-valued attributes is functional, and a 1:N relationship type is functional in one direction.

a) Enumerate all the types of dependencies in EER Diagrams which are functional.

b) Now consider the EER to Relational mapping algorithm.  For each dependency identified above, show how it is enforced as a functional dependency in the Relational Model.


DATABASE RECOVERY

Consider the redo/undo database recovery scheme with checkpointing where a steal/no-force buffer management policy and write-ahead logging is used.  The concurrency control scheme is the standard two phase locking.  

For this question compare and contrast the following two recovery schemes. In your answer, you should consider whether they both work correctly, whether the recovery should be faster with one scheme, and whether it is possible with either scheme to permit new transaction processing before the database recovery is complete (e.g., with scheme 1 can the DBMS start processing new transactions after pass 1 or pass 2).  

Scheme 1:
pass 1: The log is scanned backward to the most recent checkpoint to
             determine transactions which were active at the time of the
             crash (i.e., uncommited).
pass 2: The log is scanned forward from the most recent checkpoint
             and all updates of commited and uncommited transactions are
             performed.
pass 3: The log is scanned backward to roll back all transactions that
             were active at the time of the crash.


Scheme 2:
pass 1: The log is scanned backward  and all active (uncommited)
              transactions are rolled back.  
pass 2: The log is scanned forward from the most recent checkpoint and
              all updates of commited transactions (since the checkpoint) are
              performed.  


 DISTRIBUTED DATABASES  

a) Give three reasons why replication is of interest in a distributed database system.

b) Describe lazy-master replication. Be sure to explain what happens when updates are performed to disconnected copies and explain how conflicts are  resolved.

c) How well does lazy-master replication scale as the number of replicas is increased?

d) Suppose that the Employees relation is stored in Chicago and that Employee tuples with salary <= 100,000 are replicated at New York. Consider the following three options for lock management:
- all locks  managed at a single site, say, Atlanta;   
- primary copy with Chicago being the primary for Employees;
- fully distributed.
For each of these lock management options, explain what locks are set (and at which site) for the following queries. Also state from which site the page is read.   

A query submitted at Dallas wants to read a page of Employees tuples with salary <= 50,000.

A query submitted at Chicago wants to read a page of Employees tuples with salary <= 50,000.

A query submitted at New York wants to read a page of Employees tuples with salary <= 50,000.


PHYSICAL STORAGE STRUCTURES AND PERFORMANCE

Two algorithms for scheduling the read/write head of a magnetic disk are:
- First Come First Serve (FCFS): Requests are served in the order they are made.
- Shortest Seek Time First (SSTF): After serving one request, the head is scheduled to serve the request that requires the shortest seek time.

a) Explain why SSTF may be preferred over FCFS.

b) There is a serious problem that can occur with SSTF that cannot occur with FCFS. What is this problem and why does it occur? How can you fix it?

c) A third algorithm is called the elevator algorithm. In the elevator algorithm, the disk head first sweeps in one direction, servicing all of the requests for cylinders it passes over. Then the head sweeps in the other direction, again servicing all of the requests for cylinders it passes over. The process repeats, with the head sweeping back and forth like an elevator going up and down. Does the elevator algorithm have the same problem as in part b)? Why or why not?


 QUERY PROCESSING AND OPTIMIZATION QUESTION:

Consider the following SQL query over relation HS(S#, A) and SC(S#, C#, G):

Select * from HS where HS.A = (select AVG(SC.G)
                               from SC
                               where SC.S# = HS.S#)

Draw two possible query plans, illustrated as operator trees. You may specify logical or physical plans, whichever you prefer. The main goal is to show two quite different execution strategies for this query, and discuss pros and cons of each.


CONCURRENCY CONTROL QUESTION:

The following two transactions are executed against a data base with two objects, x and y. Initially, x = y = 0. Transaction T1 is a nested transaction, containing subtransactions T11 and T12. These subtransactions are executed in parallel, as indicated by the parallel statement. The resulting execution is serializable.

Transaction T1 {
    x <- 1;
    Parallel {
        Transaction T11 {
            x <- 2;
            read(x, y);
            y <- 3 }
        Transaction T12 {
            x <- 4;
            y <- 5 } }
    x <- 6 }

Transaction T2 {
    Read(x, y);
    x <- 7;
    y <- 8 }

(a) List all possible x, y values that may be read by T11. Briefly explain how each of these values may be obtained..
(b) List all possible x, y values that may be read by T2. Briefly explain how each of these values may be obtained.
(c) Discuss why a database system that implements nested transactions may be preferable to one that implements only flat transactions.