Database Normal Forms

23 Apr 2022 Note and Database

Normalization[1] is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation may cause insertion, deletion and update anomalies. So, it helps to minimize the redundancy in relations. Normal forms are used to eliminate or reduce redundancy in database tables.

规范化: 一个低一级的关系模式通过模式分解可以转化为若干个高一级范式的关系模式的集合,这个过程叫做规范化。

我们用范式进行数据库规范化,以减少或消除冗余。

For BDNF:

5NF ⊂ 4NF ⊂ BCNF ⊂ 3NF ⊂ 2NF ⊂ 1NF


1NF

First Normal Form(1NF): 属于第一范式关系的所有属性都不可再分,即数据项不可分。

A relation is in first normal form if it does not contain any composite or multi-valued attribute.

ID   Name   Courses
------------------
1    A      c1, c2
2    E      c3
3    M      C2, c3

In the above table Course is a multi-valued attribute so it is not in 1NF.

Below Table is in 1NF as there is no multi-valued attribute

ID   Name   Course
------------------
1    A       c1
1    A       c2
2    E       c3
3    M       c2
3    M       c3

2NF

Second Normal Form(2NF): 若某关系R属于第一范式,且每一个非主属性完全函数依赖于任何一个候选码,则关系R属于第二范式。

To be in second normal form, a relation must be in first normal form and relation must not contain any partial dependency.

(A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate key) is dependent on any proper subset of any candidate key of the table.)

此处我们需要理解非主属性、候选码和完全函数依赖的概念。

候选码: 若关系中的某一属性组的值能唯一地标识一个元组,而其子集不能,则称该属性组为候选码。若一个关系中有多个候选码,则选定其中一个为主码。

主码或候选码都简称为码。

主属性: 所有候选码的属性称为主属性。不包含在任何候选码中的属性称为非主属性或非码属性。

e.g. 学生表中[Title: 学号 姓名 年龄 性别 ],学号和姓名(假设唯一)为该关系的主属性,年龄和性别就是非主属性。

函数依赖: 设R(U)是属性集U上的关系模式,X、Y是U的子集。若对于R(U)的任意一个可能的关系r,r中不可能存在两个元组在X上的属性值相等,而在Y上的属性值不等,则称Y函数依赖于X或X函数确定Y。

完全函数依赖: 设R(U)是属性集U上的关系模式,X、Y是U的子集。如果Y函数依赖于X,且对于X的任何一个真子集X’,都有Y不函数依赖于X’,则称Y对X完全函数依赖。记作:如果Y函数依赖于X,但Y不完全函数依赖于X,则称Y对X部分函数依赖

理解[2]: 第二范式是指每个表必须有一个(常有且仅有一个)数据项作为关键字或主键(primary key),其他数据项与关键字或者主键一一对应,即其他数据项完全依赖于关键字或主键。由此可知单主属性的关系均属于第二范式。

STUD_NO COURSE_NO COURSE_FEE 1 C1 1000 2 C2 1500 1 C4 2000 4 C3 1000 4 C1 1000 2 C5 2000 {Note that, there are many courses having the same course fee. }

Here, COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO; COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO; COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO;

Hence, COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key {STUD_NO, COURSE_NO} ; But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial dependency and so this relation is not in 2NF.

To convert the above relation to 2NF, we need to split the table into two tables such as : Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE

    Table 1                                    Table 2
STUD_NO            COURSE_NO          COURSE_NO                COURSE_FEE     
1                 C1                  C1                        1000
2                 C2                  C2                        1500
1                 C4                  C3                        1000
4                 C3                  C4                        2000
4                 C1                  C5                        2000      

另,可能有 2 PK 的情况:

Consider following functional dependencies in relation R (A, B , C, D ) AB -> C [A and B together determine C] BC -> D [B and C together determine D] In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper subset of AB doesn’t determine any non-prime attribute.

Def:

2NF

A relation R with set of functional dependencies Σ is in 2NF if and only if for every functional dependency X → {A} ∈ Σ+:

• X → {A} is trivial or

• A is a prime attribute (A ∈ candidate key) or

• X is not a proper subset of candidate key.

3NF

Third Normal Form(3NF): 非主属性既不传递依赖于码,也不部分依赖于码。

A relation is in third normal form, if there is no transitive dependency for non-prime attributes as well as it is in second normal form.

在 2NF 的基础上消除了传递依赖。

Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive dependency.

Example 1 – In relation STUDENT given in Table 4, FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE}

Candidate Key: {STUD_NO}

For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal form.

To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:

STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)
STATE_COUNTRY (STATE, COUNTRY)

Superkey:(A superkey is a combination of columns that uniquely identifies any row within a relational database management system (RDBMS) table. A candidate key is a closely related concept where the superkey is reduced to the minimum number of columns required to uniquely identify each row.)

Def:

A relation R with set of functional dependencies Σ is in 3NF if and only if for every functional dependency X → {A} ∈ Σ+:

• X → {A} is trivial or

• A is a prime attribute or

• X is a superkey

BCNF

Boyce-Codd Normal Form(BCNF): 关系模式R<U,F>中,若每一个决定因素都包含码,则R<U,F>属于BCFN。

A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super key.

A relation is in BCNF iff in every non-trivial functional dependency X –> Y, X is a super key.

Def:

A relation R with set of functional dependencies Σ is in BCNF if and only if for every functional dependency X → {A} ∈ Σ+:

• X → {A} is trivial or

• X is a superkey.

Key Points –

  1. BCNF is free from redundancy.
  2. If a relation is in BCNF, then 3NF is also satisfied.
  3. If all attributes of relation are prime attribute, then the relation is always in 3NF.
  4. A relation in a Relational Database is always and at least in 1NF form.
  5. Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
  6. If a Relation has only singleton candidate keys( i.e. every candidate key consists of only 1 attribute), then the Relation is always in 2NF( because no Partial functional dependency possible).
  7. Sometimes going for BCNF form may not preserve functional dependency. In that case go for BCNF only if the lost FD(s) is not required, else normalize till 3NF only.
  8. There are many more Normal forms that exist after BCNF, like 4NF and more. But in real world database systems it’s generally not required to go beyond BCNF.

E.g., Checking Process

https://www.hackerrank.com/challenges/database-normalization-7/forum

1NF is “atomic” (no duplicate columns or multiple values, like customer 1, customer 2, etc.). - check

2NF is “dependency” (all non-key columns are dependant on primary key, which can be verified via determinant list). - check

3NF is “non-transitive” (no columns are related to another column via an intermediary, A->B + B->C = A->C, which can also be verified via determinant list).

BCNF is “if and only if for every one of its dependencies X → Y, at least one of the following conditions hold: X → Y is a trivial functional dependency (Y ⊆ X) or X is a superkey for schema R”.

The table that people have objections to being in BCNF is the customer one. For this table, we have the following dependencies:

1) memberno -> name,addr 2) name,addr -> memberno

1) Even though name and address are dependent on memberno, this is what is known as a TRIVIAL FUNCTIONAL DEPENDENCY. ID#’s are the most-common example of this. Basically, think of it as a “functional” alias. Therefore, it doesn’t violate this condition.

2) Even though someone can have multiple addresses (and therefore multiple entries), the combination of name + addr will ALWAYS give you just one entry (even if there are multiple entries for the same person). Therefore, name + addr is a superkey.

So BCNF is a check.

4NF

4NF: 限制关系模式的属性之间不允许有非平凡且非函数依赖的多值依赖。

理解: 当一个表中的非主属性互相独立时(3NF),这些非主属性不应该有多值,若有多值就违反了4NF。

General

A candidate key should define all the attributes of any relation.

Reference

[1] Normal Forms in DBMS - https://www.geeksforgeeks.org/normal-forms-in-dbms/

[2] 「小九的博客」- https://blog.csdn.net/weixin_43433032/article/details/89293663

Comments