Normalization
Overview
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. It involves decomposing tables into smaller, well-structured tables while maintaining data relationships.
Normalization follows a series of normal forms (1NF, 2NF, 3NF, BCNF, etc.), each with specific rules to eliminate different types of data redundancy and anomalies.
Why Normalize?
Normalization addresses several problems in database design:
1. Data Redundancy
Storing the same data multiple times wastes space and can lead to inconsistencies.
2. Update Anomalies
When the same data exists in multiple places, updates must be made in all locations, which is error-prone.
3. Insertion Anomalies
Unable to insert certain data without other unrelated data.
4. Deletion Anomalies
Deleting certain data may unintentionally delete other important data.
Example: Unnormalized Data
┌────────────┬──────────────┬─────────────┬──────────────┬──────────┐
│ StudentID │ StudentName │ CourseID │ CourseName │ Grade │
├────────────┼──────────────┼─────────────┼──────────────┼──────────┤
│ 101 │ John Smith │ CS101 │ Database │ A │
│ 101 │ John Smith │ MATH201 │ Calculus │ B │
│ 102 │ Jane Doe │ CS101 │ Database │ A │
│ 102 │ Jane Doe │ PHYS301 │ Physics │ A │
└────────────┴──────────────┴─────────────┴──────────────┴──────────┘
Problems:
- StudentName repeated for each course
- CourseName repeated for each student
- Data redundancy and update anomalies
Normal Forms
First Normal Form (1NF)
A table is in 1NF if:
- All attributes contain atomic (indivisible) values
- Each attribute has a unique name
- All values in a column are of the same type
- The order of rows and columns doesn't matter
NOT in 1NF:
┌────────────┬──────────────┬─────────────────┐
│ StudentID │ StudentName │ Courses │
├────────────┼──────────────┼─────────────────┤
│ 101 │ John Smith │ CS101, MATH201 │ ← Multiple values!
└────────────┴──────────────┴─────────────────┘
In 1NF:
┌────────────┬──────────────┬─────────────┐
│ StudentID │ StudentName │ CourseID │
├────────────┼──────────────┼─────────────┤
│ 101 │ John Smith │ CS101 │
│ 101 │ John Smith │ MATH201 │
└────────────┴──────────────┴─────────────┘
Second Normal Form (2NF)
A table is in 2NF if:
- It is in 1NF
- All non-key attributes are fully functionally dependent on the primary key
- No partial dependency exists (i.e., attributes depend on part of a composite key)
NOT in 2NF (Composite Key: StudentID + CourseID):
┌────────────┬──────────────┬─────────────┬──────────────┬──────────┐
│ StudentID │ StudentName │ CourseID │ CourseName │ Grade │
├────────────┼──────────────┼─────────────┼──────────────┼──────────┤
│ 101 │ John Smith │ CS101 │ Database │ A │
└────────────┴──────────────┴─────────────┴──────────────┴──────────┘
Problem:
- StudentName depends only on StudentID (partial dependency)
- CourseName depends only on CourseID (partial dependency)
- Grade depends on both StudentID + CourseID (full dependency)
Solution: Split into multiple tables
Third Normal Form (3NF)
A table is in 3NF if:
- It is in 2NF
- No transitive dependency exists (non-key attribute depends on another non-key attribute)
NOT in 3NF:
┌────────────┬──────────────┬──────────────┬──────────────┐
│ StudentID │ StudentName │ DepartmentID │ DeptName │
├────────────┼──────────────┼──────────────┼──────────────┤
│ 101 │ John Smith │ CS │ Computer Sci │
└────────────┴──────────────┴──────────────┴──────────────┘
Problem:
- DeptName depends on DepartmentID (transitive dependency)
- DeptName doesn't directly depend on StudentID
Solution: Move DeptName to separate Department table
Boyce-Codd Normal Form (BCNF)
A table is in BCNF if:
- It is in 3NF
- Every determinant is a candidate key
BCNF is a stricter version of 3NF that handles overlapping candidate keys.
Normalization Example
Let's normalize the Student-Enrollment table:
Original (Unnormalized):
Students_Courses(StudentID, StudentName, CourseID, CourseName, Grade)
Step 1: 1NF - Already atomic values ✓
Step 2: 2NF - Split to remove partial dependencies:
- Students(StudentID, StudentName) [PK: StudentID]
- Courses(CourseID, CourseName) [PK: CourseID]
- Enrollments(StudentID, CourseID, Grade) [PK: StudentID, CourseID]
[FK: StudentID → Students]
[FK: CourseID → Courses]
Step 3: 3NF - Check for transitive dependencies:
- No transitive dependencies ✓
Final normalized structure:
Students Table:
┌────────────┬──────────────┐
│ StudentID │ StudentName │
├────────────┼──────────────┤
│ 101 │ John Smith │
│ 102 │ Jane Doe │
└────────────┴──────────────┘
Courses Table:
┌─────────────┬──────────────┐
│ CourseID │ CourseName │
├─────────────┼──────────────┤
│ CS101 │ Database │
│ MATH201 │ Calculus │
└─────────────┴──────────────┘
Enrollments Table:
┌────────────┬─────────────┬──────────┐
│ StudentID │ CourseID │ Grade │
├────────────┼─────────────┼──────────┤
│ 101 │ CS101 │ A │
│ 101 │ MATH201 │ B │
│ 102 │ CS101 │ A │
└────────────┴─────────────┴──────────┘
Benefits of Normalization
- Reduced redundancy: Data stored only once
- Improved data integrity: Updates made in one place
- Easier maintenance: Changes easier to implement
- Efficient storage: Less disk space required
- Flexibility: Easier to modify structure
When to Denormalize
Sometimes, denormalization (reducing normalization) may be beneficial for:
- Performance: Reducing JOINs for frequently accessed data
- Reporting: Pre-aggregated data for analytics
- Read-heavy workloads: Trading some redundancy for speed
Important: Denormalization should be done carefully and with clear justification.
Functional Dependencies
Understanding functional dependencies is crucial for normalization:
- Functional Dependency: If A determines B, then B is functionally dependent on A (A → B)
- Full Functional Dependency: B depends on the entire A, not a subset
- Partial Dependency: B depends on part of A
- Transitive Dependency: If A → B and B → C, then A → C (transitively)
Normalization Best Practices
- Start with 3NF: Usually sufficient for most applications
- Understand dependencies: Identify functional dependencies first
- Consider performance: Balance normalization with query performance
- Document decisions: Keep track of normalization choices
- Test thoroughly: Verify normalized structure works correctly
Next Steps
Learn about applying normalization in Database Design, or explore maintaining data integrity with Constraints & Integrity.