Showing posts with label DB. Show all posts
Showing posts with label DB. Show all posts

Sunday, 25 December 2016

DBMS :

Normalization

What is Normalization?
Normalization is the process of efficiently organizing
data in a database. There are two goals of the
normalization process: eliminating redundant data
(for example, storing the same data in more than one
table) and ensuring data dependencies make sense
(only storing related data in a table). Both of these
are worthy goals as they reduce the amount of space
a database consumes and ensure that data is
logically stored.

The Normal Forms:
The database community has developed a series of
guidelines for ensuring that databases are
normalized. These are referred to as normal forms
and are numbered from one (the lowest form of
normalization, referred to as first normal form or 1NF)
through five (fifth normal form or 5NF). In practical
applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form
is very rarely seen and won't be discussed in this article.
Before we begin our discussion of the normal forms, it's important to point out that they are
guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet
practical business requirements. However, when variations take place, it's extremely important
to evaluate any possible ramifications they could have on your system and account for possible
inconsistencies. That said, let's explore the normal forms.

First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
For more details, read Put t ing your Dat abase in First Normal Form

Second Normal Form (2NF)
Second normal form (2NF) further addresses the
concept of removing duplicative data:
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique
column or set of columns (the primary key).
of a table and place them in separate tables.
Create relationships between these new tables and
their predecessors through the use of foreign keys.

Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
Meet all the requirements of the second normal form.
Remove columns that are not dependent upon the primary key.

Boyce-Codd Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds
one more requirement:
Meet all the requirements of the third normal form.
Every determinant must be a candidate key.

Fourth Normal Form (4 NF)
Finally, fourth normal form (4NF) has one additional requirement:
Meet all the requirements of the third normal form.
A relation is in 4NF if it has no multi-valued dependencies.

Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must
first fulfill all the criteria of a 1NF database.

Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows

DBMS

DBMS Notes Series-1

Hello Friends!

Let's discuss questions on DBMS from previous year papers. To begin with, some of the multiple choice type questions are listed below:


Q. Location transparency allows :

 I. Users to treat the data as if it is done at one location.
 II. Programmers to treat the data as if it is at one location.
 III. Managers to treat the data as if it is at one location.
 Which one of the following is correct ?
 (A) I, II and III (B) I and II only (C) II and III only (D) II only

The right answer is (A). 

Reason: Location transparency means the location of data must not matter to the person who accesses/manipulates the data. This is a feature of distributed databases, which applies to every kind of database user. According to a definition on Wikipedia, "The location of a resource doesn't matter to either the software developers or the end-users. This creates the illusion that the entire system is located in a single computer, which greatly simplifies software development."

The I and II are database users. The III is a component of distributed databases. Database Manager components are responsible for providing seamless data access to users without regards to its location. Hence, this covers all 3 choices.


Q. Which of the following is correct?

I. Two phase locking is an optimistic protocol.
II. Two phase locking is pessimistic protocol
III. Time stamping is an optimistic protocol.
IV. Time stamping is pessimistic protocol.

(A) I and III (B) II and IV (C) I and IV (D) II and III


The right answer is (D).

Reason: Optimistic Vs. Pessimistic approach: The optimistic concurrency control approach doesn't actually lock anything. It is based on the assumption that conflicts of database operations are very less. Means, when when oner transaction is executing, other transactions will not access the same data item being accessed by the executing one. It lets transactions run to completion and only checks for conflicts when they are about to commit. Thus, a transaction is executed without any restrictions until it is committed.

The pessimistic approach believes that some other transaction might try to access the same piece of data. So, in order to prevent any conflict, a transaction will first acquire all the required locks, then perform all the operations. It has two phases:

1. Growing Phase, where a transaction must first acquire all the locks.
2. Shrinking Phase, where a transaction releases all the locks one-by-one.(It cannot issue lock requests here.)

Q. Data warehousing refers to
(A) storing data offline at a separate site (B) backing up data regularly
(C) is related to data mining (D) uses tape as opposed to disk


The right answer is (C)
Reason: NOT A because: Not all data of Data warehouse stored offline as it depends on nature and usage of data.
NOT B because: A data warehouse typically stores a lot of historical data, that is often not subject to change. Data that does not change only needs to be backed up once.
NOT D because: Data may be stored using a proper mix of disks, tapes, or near-line storage.

Common data warehouse models include a data warehouse that is subject oriented, time variant, non-volatile, and integrated.


Q. The "PROJECT' operator of a relational algebra creates a new table that has always
(A) More columns than columns in original table
(B) More rows than original table
(C) Same number of rows as the original table
(D) Same number of columns as the original table

The right answer is (A) 
Reason: The number of tuples in the result of PROJECT  <list> (R) is always less or equal to the number of tuples in R.

Now, a few questions with descriptive answers:


Q. Show that 2-phase locking ensures serializability?


A. In databases and transaction processing two-phase locking, (2PL) is a concurrency control method that guarantees serializability. A transaction is said to follow the two-phase locking protocol if all locking operations (read_lock, write_lock) precede the first unlock operation in the transaction. Such a transaction can be divided into two phases:


Phase 1: Growing Phase

i)  transaction may obtain locks
ii)  transaction may not release locks

Phase 2: Shrinking Phase

i)  transaction may release locks
ii)  transaction may not obtain locks

If lock conversion is allowed, then upgrading of locks (from read-locked to write-locked) must be done during the expanding phase, and downgrading of locks (from write-locked to read-locked) must be done in the shrinking phase. Hence, a read_lock(X) operation that downgrades an already held write lock on X can appear only in the shrinking phase.


The protocol assures serializability. It can be proved that the transactions can be serialized in the order of their lock points  (i.e. the point where a transaction acquired its final lock). Two-phase locking does not ensure freedom from deadlocks.


Q. How to determine candidate key(s) for a relation?

A. Finding candidate keys is just as simple as applying some algorithm here and there.

In the first example, there are five attributes:
W H O S E
WH -> S
HOS -> E
Steps:
1. Find the attributes that are neither on the left and right side
> (none)
2. Find attributes that are only on the right side
> E
3. Find attributes that are only on the left side
> WHO
4. Combine the attributes on step 1 and 3
> since step 1 has no attributes, it’s just WHO
5. Test if the closures of attributes on step 4 are all the attributes
> in this case, it is true. Because with WH we can get S, and by HOS, we can get E.
So we have only one candidate key that is WHO.

Q. What are steps of a Database design?


A. Major Steps in Database Design are:
  1. Requirements Analysis: Talk to the potential users! Understand what data is to be stored, and what operations and requirements are desired.
  2. Conceptual Database Design: Develop a high-level description of the data and constraints (we will use the ER data model)
  3. Logical Database Design: Convert the conceptual model to a schema in the chosen data model of the DBMS. For a relational database, this means converting the conceptual to a relational schema (logical schema).
  4. Schema Refinement: Look for potential problems in the original choice of schema and try to redesign.
  5. Physical Database Design: Direct the DBMS into choice of underlying data layout (e.g., indexes and clustering) in hopes of optimizing the performance.
  6. Applications and Security Design: It defines how the underlying database will interact with surrounding applications.
Q. Differentiate b/w Specialisation and Generalisation in ER Model?

A. Specialisation:

Top-down design process; we designate subgroupings within an entity set that are distinctive from other entities in the set.
These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set.
Depicted by a triangle component labeled ISA (E.g. customer “is a” person).
Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked.

Generalisation:

A bottom-up design process – combine a number of entity sets that share the same features into a higher-level entity set.
Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.
The terms specialization and generalization are used interchangeably.

DPD

Dependency Preservation Decomposition

December 2010 - Question No 17
The dependency preservation decomposition is a property to decompose database schema D, in which each functional dependency X → Y specified in F,

(A) appeared directly in one of the relation schemas Ri in the decomposed D.
(B) could be inferred from dependencies that appear in some Ri.
(C) both (A) and (B)
(D) None of these

Explanation:- The question itself requires a bit of explanation. It is not enough if you just know what is the right answer but you must also know why it is the right answer. The explanation would be a bit lengthy. Let us first dissect the question and explain some terms in terms of DBMS.
Decomposition - This means replacing a relation with a collection of smaller relations.

Relation - Relation is known as Table.

Relation Schema - This is known as Table definition. Relation Schema for a "student" relation can be shown in the following way:
Student(FirstName,LastName,DOB,Gender,Course,Regno,Address)

Definition of Dependency preservation decomposition:-
Each FD specified in F either appears directly in one of the relations in the decomposition, or be inferred from FDs that appear in some relation.

Let us consider an example for Dependency preservation

Let R be a relation R(A B C D)
Let there be 3 functional dependencies.
FD1: A->B
FD2: B->C
FD3: C->D
Let the relation R be decomposed into two more relations.
R1(A B C)  :  R2(C D)
Let us first consider the relation R1(A B C). Here between A and B the functional dependency FD1 is preserved. Between B and C, FD2 is preserved.
Let us now consider the second relation R2(C D). Between C and D the FD, FD3 is preserved. So in the two relations R1 and R2, all the 3 functional dependencies are preserved.

Let us consider an example for Non-dependency preservation

Let R be a relation R(A B C D) Let there be again 3 functional dependencies.
FD1:A->B
FD2:B->C
FD3:C->D
Let the relation be decomposed into two more relations>
R1(A C D) R2(B C)
Let us first consider the relation R1(A C D). There is no FD between A and C. There is a FD3 between C and D.
Now let us consider the second relation R2(B C). There is FD2 between B and C.
So, the two relations only support only FD's FD2 and FD3. FD1 is not supported. So these relations does not preserve dependency.
Generally there are three desirable properties of a decomposition.

  1. Lossless
  2. Dependency preservation
  3. Minimal redundancy
The above question was based on dependency preservation decomposition. This example has been taken from the dependency preservation presentation by Jason Allen. The explanation is quite good there.

SUMMARY:-

The dependency preservation decomposition is a property to be considered for decomposing a relation into two or more smaller relations. The functional dependency X->Y specified in F can appear directly in one of the relation schemas Ri in the decomposed D or it could be inferred from dependencies that appear in some Ri. So the answer for this question is C.

Ans:-C

DBMS

DBMS



In DBMS, there could be a question on different steps in normalization and what is achieved at the end of every step in it.You need to be knowing the following things very clearly and without any ambiguity.The question would be based on these terms.

A table is in 1NF if there and no duplicate rows in the table. Each cell is single-valued.

A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all of the key. A table is in 2NF if it is in 1NF and if it has no partial dependencies

A table is in 3NF if it is in 2NF and if it has no transitive dependencies

A table is in BCNF if it is in 3NF and if every determinant is a candiate key.

A table is in 4NF if it is in BCNF and it it has no multi-valued dependencies.

A table is in 5NF if it is in 4NF and it has no join dependency.

Superkey,Candidate key,Primary key

A superkey is any set of attributes such that the values of the attributes(taken together)uniquely identify one entity in the entity set.

A candidate key is a minimal superkey.

A primary key is one of the candidate keys, designated by the Database designer.

Thursday, 4 February 2016

DBMS

Database System Concepts
Fifth Edition
Avi Silberschatz
Henry F. Korth
S. Sudarshan

line separator

We provide a set of slides to accompany each chapter. Click on the links below to download the slides in the format of your choice: Powerpoint and PDF.

Copyright Note

The slides and figures below are copyright Silberschatz, Korth. Sudarshan, 2005. The slides and figures are authorized for personal use, and for use in conjunction with a course for which Database System Concepts is the prescribed text. Instructors are free to modify the slides to their taste, as long as the modified slides acknowledge the source and the fact that they have been modified. Paper copies of the slides may be sold strictly at the price of reproduction, to students of courses where the book is the prescribed text. Any use that differs from the above, and any for profit sale of the slides (in any form) requires the consent of the copyright owners; contact Avi Silberschatz (avi@cs.yale.edu) to obtain the copyright owners consent. line separator
NOTE: links below are to cached local copies at IIT Bombay. May not be in sync with the copies at db-book.com. NEW: pdfs are also cached locally
line separator
Chapter Formats Last Updated
1. Introduction ppt, pdf July 23, 2008
Part 1: Relational Databases
2. Relational Model ppt, pdf July 25, 2008
3. SQL ppt, pdf August 6, 2008
4. Advanced SQL ppt, pdf August 10, 2008
5. Other Relational Languages ppt, pdf September 2006
Part 2: Database Design
6. Database Design: The Entity-Relationship Approach ppt, pdf September 2006
7. Relational Database Design ppt, pdf October 2006
8. Application Design ppt, pdf August 9, 2005
Part 3: Object-Based Databases and XML
9. Object-Based Databases ppt, pdf September 20, 2005
10. XML ppt, pdf July, 2006
Part 4: Data Storage and Querying
11. Storage and File Structure ppt, pdf Aug 2006
12. Indexing and Hashing ppt, pdf Aug 2006
13. Query Processing ppt, pdf Aug 2006
14. Query Optimization ppt, pdf Aug 2006
Part 5: Transaction Management
15. Transactions ppt, pdf September 2006
16. Concurrency Control ppt, pdf October 5, 2006
17. Recovery System ppt, pdf October 5, 2006
Part 6: Data Mining and Analysis
18. Data Analysis and Mining ppt, pdf August 27, 2005
19. Information Retrieval ppt, pdf September 2, 2005
Part 7: System Architecture
20. Database System Architectures ppt, pdf October 5, 2006
21. Parallel Databases ppt, pdf August 22, 2005
22. Distributed Databases ppt, pdf August 22, 2005
Part 7: Other Topics
23. Advanced Application Development ppt, pdf Preliminary version
24. Advanced Data Types ppt, pdf Preliminary version
25. Advanced Transaction Processing ppt, pdf Preliminary version
Part 8: Case Studies
26. PostgreSQL ppt, pdf Not available
27. Oracle ppt, pdf Not available
28. IBM DB2 Universal Database ppt, pdf Not available
29. Microsoft SQL Server ppt, pdf Not available
Appendices
A. Network Model ppt, pdf August 23, 2005
B. Hierarchical Model ppt, pdf August 23, 2005
C. Advanced Relational Database Design ppt, pdf August 23, 2005