# NORMAL FORMS Database Management System

# NORMAL FORMS Database Management System

**NORMAL FORMS**

**NORMAL FORMS Database Management System :** The primary objective of normalization is to avoid some of the anomalies that we discussed at the beginning of the chapter. Some of the points regarding normal forms are:

- In the process of creating normal form Relations are decomposed to normalize.
- Following two things are mainly taken care of while decomposition
**:** - Loss-less join property
- Dependency preserving property

- Normal forms are based on functional dependencies.
- Data is normalized for following purpose :
- Minimize redundancies
- Minimize anomalies
- Ensuring data is stored in the correct table

Eliminating need for restructuring database when data is added.

Initially Codd (1972) presented three normal forms (1NF, 2NF and 3NF) all based on functional dependencies among the attributes of a relation. Later Boyce and Codd proposed another normal form called the Boyce-Codd normal form (BCNF). The fourth and fifth normal forms are based on multivalued and join dependencies and were proposed later.

The following figure illustrate various normal forms:

Outcome and operations of each normal form:

**Simple Definitions**

**ONF**: non-first normal form**1NF**: R is in 1NF if all domain values are atomic**2NF**: R is in 2NF iff R is in 1NF and every non-key attribute is fully dependent on the key**3NF**: R is in 3NF if R is 2NF and every non-key attribute is non-transitively dependent on the key**BCNF**: R is in BCNF iff every determinant is a candidate key.**Determinant**: an attribute on which some other attribute is fully functionally dependent.

**First Normal Form (1NF)**

**First Normal Form requires that: **

- Domain of an attribute must include only atomic (simple, indivisible) Values
- Value of any attribute in a tuple must be a single value from domain of that attribute.

A table satisfying the properties of a relation is said to be in first normal form. As discussed in an earlier chapter, a relation cannot have multivalued or composite attributes. This is what the 1NF requires.

*A relation is in 1NF if and only if all underlying domains contain atomic values only For Example Consider The Following relation:*

The above relation is not in 1NF because there are multiple values in color field.

After Converting it to 1NF it will look like:

The first normal form deals only with the basic structure of the relation and does not resolve the problems of redundant information ortheanomalies discussed earlier. All relations discussed in these notes are in 1NF.

For another example consider the following example relation:

*student (sno, sname, dob)*

Add Some other attributes So it has anomalies and is not in 2NF

The attribute dob is the date of birth and the primary key of the relation is sno with the functional dependencies **sno ->**** sname** and **sno ->**** dob**. The relation is in 1NF as long as dob is considered an atomic value and not consisting of three components *(day, month, year)*. The above relation of course suffers from all the anomalies that we have discussed earlier and needs to be normalized.

**First Normal Form (1NF)**

A Schema R is said to be in first normal form (1 NF) if all its attributes have atomic domains only i.e. domains comprising of indivisible values. In other words, for a Relation Schemato be in 1 NF, all its attributes should be simple and single-valued.

Let R1(A, B, C,D,E) be a schema, such that all its attributes A.E have atomic domains. Let the set of FDs that holds on R1 be F: {AB **->** C, A **->** D, D **->** E}.

By the Armstrong’s Rule of Transitivity A->D, D**->**E implies A**->**E. Thus, all legal relation on the schema R1 will also satisfy A **->** E.

Since, all attributes of R1 have atomic domains, it is in 1 NF.

AB forms candidate key of this Schema, since{AB}+ = ABCDE. Thus, A and Bare prime attributes of R and C, D and E are non-prime attributes.

The Non-prime attribute C is determined by the full candidate key i.e. AB **->** C. So, the non-prime attribute C is said to be fully functionally dependent on the candidate of its schema.

However, the non-prime attributes D and E are determined by A alone, which is part of candidate key. Such a dependency is called partial functional dependency; and it causes certain update anomalies.

Thus, a **Partial Functional Dependency** refers to a situation, wherein a non-prime attribute of a schema is determined by a proper sub-set of its candidate keys.

Thus, the following schema is in 1 NF

R1 (A, B, C, D, E) Primary Key {A,B}

AB **-> **C, A **->** D, D **->** E

Example:

Consider a Schema SP1 (S#, P#, Sname, Scity, Status, Pname Qty)

Where S# : Supplier Id number (Unique)

P# : Part id Number (Unique)

Sname : Supplier Name

Scity : Supplier City

Then, R is in BCNF (Boyce-Codd Normal Form)

What if the Schema is not in BCNF ?

Decompose (split) the schema into two pieces.

**Careful :** you want the decomposition to be lossless Boyce-Codd Normal Form (BCNF) is an extension of 3 NF in the case with two or more candidate keys which are composite and overlapping (that is, they have at leastonefield in Common). If these Conditions are not fulfilled, 3 NF and BCNF are equivalent. A table is BCNF if, and only if its only determinants are candidate keys.

In the following table, both {Supplier ID, Part ID}, as well as {SupplierName, Part ID}, are candidate keys. The table is not BCNF since it contains two determinants (SupplierID and SupplierName) which are not candidate keys. (SupplierID and SupplierName are determinants, since they determine each other.)

{ SupplierID, Part ID., SupplierName, Quantity }

However, either of the following decompositions is BCNF:

{ Supplier ID, SupplierName }

( SupplierID, Part ID, Quantity }

Or

{ SupplierName, SupplierID }

{ SupplierName, PartID, Quantity }

To achieve BCNF, remove the determinants which are not candidate keys.

Boyce-Codd normal form (or BCNF) requires that there are no non-trivial functional dependencies of attributes on Something else than a Superset of a candidate key. At this stage, all attributes are dependent on a key, a whole key and nothing but a key (excluding trivial dependencies, like A->A). A relation is said to be in the BCNF if and only if it is in the 3 NF and every non-trivial, left-irreducible functional dependency has a candidate key as its determinant. In more informal terms, a relation is in BCNF if it is in 3NF and the only determinants are the candidate keys.

- When a relation has more than one candidate key, anomalies may result eventhough the relation is in 3 NF.
- 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys i.e., Composite Candidate keys with at least one attribute in Common.
- BCNF is based on the concept of a determinant.
- A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent.
- A relation is in BCNF is, and only if, every determinant is a candidate key.

Consider the following relation and determinants.

*R (a,b,c,d)*

*a,c -> b,d*

*a,d -> b*

Here, the first determinant suggests that the primary key of R could be changed from a,b to a, c. If this change was done all of the non-key attributes present in R could still be determined, and therefore this change is legal. However, the second determinant indicates that a, d determines b, but ad could not be the key of Rasad does not determine all of the non key attributes of R (it does not determine c). We would say that the first determinate is a candidate key, but the second determinant is not a candidate key, and thus this relation is not in BCNF (but is in 3rd normal form).

- Time -> appNo
- Time is present, and so is appNo, so relevant. Is this a candidate key. If it was then we could rewrite DB as:
*DB (Patno, appo, time, doctor)*- This will not work, as you need both time and Patno together to form a unique key. Thus this determinate is not a candidate key, and therefore DB is not in BCNF. We need to fix this.
- BCNF: rewrite to

*DB ( Patno, time, doctor)*

*R1( PataQ, PatName)*

*R2 ( time, appNo)*

time is enough to work out the appointment number of a patient. Now BCNF is satisfied, and the final relations shown are in BCNF.

**Example 1b: DB ( Patno, PatName,appNo, time, doctor)**

- 1NF Eliminate repeating groups.

**None:**

**DB( Patno, PatName, appNo, time, doctor)**

- 2NF Eliminate partial key dependencies

*DB (Patno, time, doctor)*

*R1(Patno, PatName)*

*R2 (time, appNo)*

- 3NF Eliminate transitive dependencies

*None: So just as 2NF*

- BCNFEvery determinant is a candidate key

*DB(Patno, time, doctor)*

*R1 (Patno, PatName)*

*R2(Lime,appNo)*

*Go through all determinates where ALL of the left hand attributes are present in a relation and at least ONE of the right hand attributes are also present in the relation.**Patno -> PatName*

Patno is present in DB, but not PatName, so not relevant.

- Patno, appNo -> Time, doctor

Notall LHS present, so not relevant.

- Time -> appNo

Time is present, and so is appNo, so relevant. This is a candidate key. However, Time is currently the key for R2, so satisfies the rules for BCNF.

- BCNF: as 3NF

*DB (Patno, time, doctor)*

*R1 (Patno, PatName)*

*R2 (time, appNo)*

**Summary: Example 1**

This example has demonstrated three things:

- BCNF is stronger than 3 NF, relations that are in 3 NF are not necessarily in BCNF
- BCNF is needed in certain situations to obtain full understanding of the data model there are several routes to take to arrive at the same set of relations in BCNF.
- Unfortunately there are no rules as to which route will be the easiest one to take.