DATABASE QUALIFIER – FALL 2003
Please answer 6 of 8 questions
DATA MODELS
(a) Give one or two concrete examples /scenarios where relational model
will be inadequate to model the data. Point out clearly why the
relational model is inadequate and suggest what other models you will
consider and why.
(b) Construct an example with at least four relations using foreign
key/primary key relationships (not used in any textbook) and draw a
corresponding conceptual design using ER or enhanced ER (EER) model.
Argue in what sense the conceptual model is equivalent and how it
improves the understanding about the data involved.
(c) What do the OODBMSs have to offer in terms of “advanced modeling”
features? What is the relationship between the OOPLs (object oriented
programming languages) and OODBMSs ? In your opinion why have O-O data
models not been very successful?
DATABASE TECHNOLOGIES AND APPLICATIONS
A. Consider two challenges for DBMSs: (a) large number of data types
including unstructured data and (b) an ability to deal with multiple
database platforms and models to bring about integration of databases.
- Discuss the general approach and strategy taken by database
systems/products vendors (DBMS companies like Oracle, IBM and Microsoft
) to meet above challenges by pointing out the features/functions they
already offer or will offer in the near future and the
difficulties involved.
- Are there technology solutions offered outside the purview of
individual DBMSs to “solve” the database integration problem as stated
in (b) above? If so, point out what type of tools or technology
solutions exist, and to what extent and how they bring about the
integration. Give an example of the integration scenario.
B. Write a paragraph showing your understanding of one of the following
product categories (comment on the typical functionality of the
products, relationship to database technology, the major difficulties
and ongoing research issues): (a) search engines (b) data mining/OLAP
–(online analytical processing) tools.
RELATIONAL DATABASE DESIGN
Enhanced Entity-Relationship (EER) Diagrams represent a number of
dependencies that are functional. For example, the dependency from an
entity type to one of its single-valued attributes is functional, and a
1:N relationship type is functional in one direction.
a) Enumerate all the types of dependencies in EER Diagrams which are
functional.
b) Now consider the EER to Relational mapping algorithm. For each
dependency identified above, show how it is enforced as a functional
dependency in the Relational Model.
DATABASE RECOVERY
Consider the redo/undo database recovery scheme with checkpointing
where a steal/no-force buffer management policy and write-ahead logging
is used. The concurrency control scheme is the standard two phase
locking.
For this question compare and contrast the following two recovery
schemes. In your answer, you should consider whether they both work
correctly, whether the recovery should be faster with one scheme, and
whether it is possible with either scheme to permit new transaction
processing before the database recovery is complete (e.g., with scheme
1 can the DBMS start processing new transactions after pass 1 or pass
2).
Scheme 1:
pass 1: The log is scanned backward to the most recent checkpoint to
determine transactions which were active at the time of the
crash (i.e., uncommited).
pass 2: The log is scanned forward from the most recent checkpoint
and all updates of commited and uncommited transactions are
performed.
pass 3: The log is scanned backward to roll back all transactions that
were active at the time of the crash.
Scheme 2:
pass 1: The log is scanned backward and all active (uncommited)
transactions are rolled back.
pass 2: The log is scanned forward from the most recent checkpoint and
all updates of commited transactions (since the checkpoint) are
performed.
DISTRIBUTED DATABASES
a) Give three reasons why replication is of interest in a distributed
database system.
b) Describe lazy-master replication. Be sure to explain what happens
when updates are performed to disconnected copies and explain how
conflicts are resolved.
c) How well does lazy-master replication scale as the number of
replicas is increased?
d) Suppose that the Employees relation is stored in Chicago and that
Employee tuples with salary <= 100,000 are replicated at New York.
Consider the following three options for lock management:
- all locks managed at a single site, say, Atlanta;
- primary copy with Chicago being the primary for Employees;
- fully distributed.
For each of these lock management options, explain what locks are set
(and at which site) for the following queries. Also state from which
site the page is read.
A query submitted at Dallas wants to read a page of Employees tuples
with salary <= 50,000.
A query submitted at Chicago wants to read a page of Employees tuples
with salary <= 50,000.
A query submitted at New York wants to read a page of Employees tuples
with salary <= 50,000.
PHYSICAL STORAGE STRUCTURES AND PERFORMANCE
Two algorithms for scheduling the read/write head of a magnetic disk
are:
- First Come First Serve (FCFS): Requests are served in the order they
are made.
- Shortest Seek Time First (SSTF): After serving one request, the head
is scheduled to serve the request that requires the shortest seek time.
a) Explain why SSTF may be preferred over FCFS.
b) There is a serious problem that can occur with SSTF that cannot
occur with FCFS. What is this problem and why does it occur? How can
you fix it?
c) A third algorithm is called the elevator algorithm. In the elevator
algorithm, the disk head first sweeps in one direction, servicing all
of the requests for cylinders it passes over. Then the head sweeps in
the other direction, again servicing all of the requests for cylinders
it passes over. The process repeats, with the head sweeping back and
forth like an elevator going up and down. Does the elevator algorithm
have the same problem as in part b)? Why or why not?
QUERY PROCESSING AND OPTIMIZATION QUESTION:
Consider the following SQL query over relation HS(S#, A) and SC(S#, C#,
G):
Select * from HS where HS.A = (select AVG(SC.G)
from
SC
where SC.S# = HS.S#)
Draw two possible query plans, illustrated as operator trees. You may
specify logical or physical plans, whichever you prefer. The main goal
is to show two quite different execution strategies for this query, and
discuss pros and cons of each.
CONCURRENCY CONTROL QUESTION:
The following two transactions are executed against a data base with
two objects, x and y. Initially, x = y = 0. Transaction T1 is a nested
transaction, containing subtransactions T11 and T12. These
subtransactions are executed in parallel, as indicated by the parallel
statement. The resulting execution is serializable.
Transaction T1 {
x <- 1;
Parallel {
Transaction T11 {
x <- 2;
read(x, y);
y <- 3 }
Transaction T12 {
x <- 4;
y <- 5 } }
x <- 6 }
Transaction T2 {
Read(x, y);
x <- 7;
y <- 8 }
(a) List all possible x, y values that may be read by T11. Briefly
explain how each of these values may be obtained..
(b) List all possible x, y values that may be read by T2. Briefly
explain how each of these values may be obtained.
(c) Discuss why a database system that implements nested transactions
may be preferable to one that implements only flat transactions.