Distributed Database Management Systems

Overview

A Distributed Database Management System (DDBMS) manages a database that is distributed across multiple computer sites within a computer network. Unlike a centralized database system, where all data is stored at a single location, a distributed database allows data to be stored and accessed from multiple locations while appearing as a single logical database.

The Evolution of Distributed Database Management Systems

Distributed database systems evolved from centralized database systems as organizations grew and needed to:

DDBMS Advantages and Disadvantages

Advantages

Disadvantages

Distributed Processing vs Distributed Databases

It's important to distinguish between distributed processing and distributed databases:

Distributed Processing

Distributed processing shares the database's logical processing among multiple sites connected to a network. The database itself remains centralized, but processing is distributed.

Distributed Database

A distributed database is a logically related database that is stored in two or more physically independent sites. Both data and processing are distributed.

Characteristics of Distributed Database Management Systems

A true DDBMS must have the following characteristics:

DDBMS Components

A distributed database system includes:

Levels of Data and Process Distribution

Single-Site Processing, Single-Site Data (SPSD)

All processing is done on a single CPU or host computer, and all data are stored on the host computer's local disk. This is the simplest type of system - essentially a centralized system.

Multiple-Site Processing, Single-Site Data (MPSD)

Multiple processes run on different computers sharing a single data repository. The data are stored on a single computer but can be accessed from multiple computers. This is typical of client/server systems.

Multiple-Site Processing, Multiple-Site Data (MPMD)

Fully distributed database system with data and processing distributed across multiple sites. This is a true DDBMS where each site can process local transactions and participate in global transactions.

Distributed Database Transparency Features

Transparency features hide the complexity of distribution from users and applications:

Distribution Transparency

Users see the database as a single, logical entity. The system handles:

Transaction Transparency

Ensures that distributed transactions maintain ACID properties across all sites:

Distributed Requests

A single SQL statement can access data at multiple sites:

Distributed Request Example
SELECT * 
FROM CUSTOMER_AT_SITE1 C, INVOICE_AT_SITE2 I
WHERE C.CUS_CODE = I.CUS_CODE;

Distributed Transactions

Multiple SQL statements that access data at multiple sites:

Distributed Transaction
BEGIN TRANSACTION;
    UPDATE ACCOUNT_AT_SITE1 SET BALANCE = BALANCE - 100;
    UPDATE ACCOUNT_AT_SITE2 SET BALANCE = BALANCE + 100;
COMMIT;

Two-Phase Commit Protocol

The two-phase commit protocol ensures atomicity in distributed transactions:

  1. Phase 1 - Preparation: All sites prepare to commit. Each site locks resources and writes transaction log entries.
  2. Phase 2 - Final Commit: If all sites agree, the transaction is committed. If any site fails, the transaction is rolled back at all sites.

Distributed Concurrency Control

Distributed systems require sophisticated concurrency control mechanisms:

Performance and Failure Transparency

Performance Transparency

The distributed database performs as if it were a centralized database. The system optimizes:

Failure Transparency

The system continues operating even if one or more sites fail:

Distributed Database Design

Data Fragmentation

Breaking a database into logical units (fragments) that can be stored at different sites:

Horizontal Fragmentation

Divides a table into subsets of rows. Each fragment contains rows that meet a specific condition:

Horizontal Fragmentation Example
-- Fragment 1: Customers in Region 1
SELECT * FROM CUSTOMER WHERE REGION = 1;

-- Fragment 2: Customers in Region 2
SELECT * FROM CUSTOMER WHERE REGION = 2;

Vertical Fragmentation

Divides a table into subsets of columns. Each fragment contains specific attributes:

Vertical Fragmentation Example
-- Fragment 1: Employee basic info
SELECT EMP_ID, EMP_NAME, EMP_DEPT FROM EMPLOYEE;

-- Fragment 2: Employee salary info
SELECT EMP_ID, EMP_SALARY, EMP_BONUS FROM EMPLOYEE;

Mixed Fragmentation

Combines horizontal and vertical fragmentation techniques.

Data Replication

Storing copies of data at multiple sites to improve availability and performance:

Replication Strategies:

Data Allocation

Determining where to place data fragments:

Allocation decisions consider:

The CAP Theorem

The CAP theorem states that a distributed system can guarantee at most two of the following three properties:

In practice, distributed systems must choose which two properties to prioritize, as network partitions are inevitable in distributed environments.

C.J. Date's 12 Commandments for Distributed Databases

E.F. Codd's rules for relational databases were extended by C.J. Date for distributed databases:

  1. Local Autonomy: Local sites are independent
  2. No Central Site: No reliance on a central site for operations
  3. Continuous Operation: System never stops for upgrades or reconfigurations
  4. Location Independence: Users don't need to know where data is stored
  5. Fragmentation Independence: Users don't need to know data is fragmented
  6. Replication Independence: Users don't need to know data is replicated
  7. Distributed Query Processing: System optimizes distributed queries
  8. Distributed Transaction Management: System manages distributed transactions
  9. Hardware Independence: System works with different hardware
  10. Operating System Independence: System works with different operating systems
  11. Network Independence: System works with different network protocols
  12. DBMS Independence: System works with different DBMSs

Best Practices

Next Steps

Learn about analyzing business data with Business Intelligence & Data Warehouses, or explore modern data challenges with Big Data & NoSQL.