Normalization

Overview

Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves decomposing tables into smaller, well-structured tables while maintaining data relationships.

Normalization follows a series of normal forms (1NF, 2NF, 3NF, BCNF, etc.), each with specific rules to eliminate different types of data redundancy and anomalies.

Why Normalize?

Normalization addresses several problems in database design:

1. Data Redundancy

Storing the same data multiple times wastes space and can lead to inconsistencies.

2. Update Anomalies

When the same data exists in multiple places, updates must be made in all locations, which is error-prone.

3. Insertion Anomalies

Unable to insert certain data without other unrelated data.

4. Deletion Anomalies

Deleting certain data may unintentionally delete other important data.

Example: Unnormalized Data

Unnormalized Student-Enrollment Table
┌────────────┬──────────────┬─────────────┬──────────────┬──────────┐
│ StudentID  │ StudentName  │ CourseID    │ CourseName   │ Grade    │
├────────────┼──────────────┼─────────────┼──────────────┼──────────┤
│ 101        │ John Smith   │ CS101       │ Database     │ A        │
│ 101        │ John Smith   │ MATH201     │ Calculus     │ B        │
│ 102        │ Jane Doe     │ CS101       │ Database     │ A        │
│ 102        │ Jane Doe     │ PHYS301     │ Physics      │ A        │
└────────────┴──────────────┴─────────────┴──────────────┴──────────┘

Problems:
- StudentName repeated for each course
- CourseName repeated for each student
- Data redundancy and update anomalies

Normal Forms

First Normal Form (1NF)

A table is in 1NF if:

Violating 1NF: Multivalued Attribute
NOT in 1NF:
┌────────────┬──────────────┬─────────────────┐
│ StudentID  │ StudentName  │ Courses         │
├────────────┼──────────────┼─────────────────┤
│ 101        │ John Smith   │ CS101, MATH201  │ ← Multiple values!
└────────────┴──────────────┴─────────────────┘

In 1NF:
┌────────────┬──────────────┬─────────────┐
│ StudentID  │ StudentName  │ CourseID    │
├────────────┼──────────────┼─────────────┤
│ 101        │ John Smith   │ CS101       │
│ 101        │ John Smith   │ MATH201     │
└────────────┴──────────────┴─────────────┘

Second Normal Form (2NF)

A table is in 2NF if:

Partial Dependency Example
NOT in 2NF (Composite Key: StudentID + CourseID):
┌────────────┬──────────────┬─────────────┬──────────────┬──────────┐
│ StudentID  │ StudentName  │ CourseID    │ CourseName   │ Grade    │
├────────────┼──────────────┼─────────────┼──────────────┼──────────┤
│ 101        │ John Smith   │ CS101       │ Database     │ A        │
└────────────┴──────────────┴─────────────┴──────────────┴──────────┘

Problem:
- StudentName depends only on StudentID (partial dependency)
- CourseName depends only on CourseID (partial dependency)
- Grade depends on both StudentID + CourseID (full dependency)

Solution: Split into multiple tables

Third Normal Form (3NF)

A table is in 3NF if:

Transitive Dependency Example
NOT in 3NF:
┌────────────┬──────────────┬──────────────┬──────────────┐
│ StudentID  │ StudentName  │ DepartmentID │ DeptName     │
├────────────┼──────────────┼──────────────┼──────────────┤
│ 101        │ John Smith   │ CS           │ Computer Sci │
└────────────┴──────────────┴──────────────┴──────────────┘

Problem:
- DeptName depends on DepartmentID (transitive dependency)
- DeptName doesn't directly depend on StudentID

Solution: Move DeptName to separate Department table

Boyce-Codd Normal Form (BCNF)

A table is in BCNF if:

BCNF is a stricter version of 3NF that handles overlapping candidate keys.

Normalization Example

Let's normalize the Student-Enrollment table:

Step-by-Step Normalization
Original (Unnormalized):
Students_Courses(StudentID, StudentName, CourseID, CourseName, Grade)

Step 1: 1NF - Already atomic values ✓

Step 2: 2NF - Split to remove partial dependencies:
- Students(StudentID, StudentName)          [PK: StudentID]
- Courses(CourseID, CourseName)             [PK: CourseID]
- Enrollments(StudentID, CourseID, Grade)   [PK: StudentID, CourseID]
                                           [FK: StudentID → Students]
                                           [FK: CourseID → Courses]

Step 3: 3NF - Check for transitive dependencies:
- No transitive dependencies ✓

Final normalized structure:
Normalized Tables
Students Table:
┌────────────┬──────────────┐
│ StudentID  │ StudentName  │
├────────────┼──────────────┤
│ 101        │ John Smith   │
│ 102        │ Jane Doe     │
└────────────┴──────────────┘

Courses Table:
┌─────────────┬──────────────┐
│ CourseID    │ CourseName   │
├─────────────┼──────────────┤
│ CS101       │ Database     │
│ MATH201     │ Calculus     │
└─────────────┴──────────────┘

Enrollments Table:
┌────────────┬─────────────┬──────────┐
│ StudentID  │ CourseID    │ Grade    │
├────────────┼─────────────┼──────────┤
│ 101        │ CS101       │ A        │
│ 101        │ MATH201     │ B        │
│ 102        │ CS101       │ A        │
└────────────┴─────────────┴──────────┘

Benefits of Normalization

When to Denormalize

Sometimes, denormalization (reducing normalization) may be beneficial for:

Important: Denormalization should be done carefully and with clear justification.

Functional Dependencies

Understanding functional dependencies is crucial for normalization:

Normalization Best Practices

Next Steps

Learn about applying normalization in Database Design, or explore maintaining data integrity with Constraints & Integrity.