Business Intelligence & Data Warehouses

Overview

Business Intelligence (BI) is a comprehensive, cohesive, and integrated set of tools and processes used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information to support business decision making.

BI is a framework that allows a business to transform data into information, information into knowledge, and knowledge into wisdom. BI has the potential to positively affect a company's culture by creating continuous business performance improvement through active decision support at all levels in an organization.

The Need for Data Analysis

Organizations need to track daily transactions and analyze data to answer strategic questions such as:

Decision makers can no longer wait days for reports - they need quick decisions to remain competitive. The key is having the right data at the right time to support the decision-making process.

Business Intelligence

Business Intelligence Framework

BI provides a framework for:

Business Intelligence Architecture

The BI architecture includes six basic components:

  1. ETL Tools: Data extraction, transformation, and loading tools that collect, filter, integrate, and aggregate internal and external data
  2. Data Store: Optimized for decision support, typically a data warehouse or data mart
  3. Query and Reporting: Tools for data selection, retrieval, and report creation
  4. Data Visualization: Presents data in meaningful ways (summary reports, maps, graphs, dashboards)
  5. Data Mining: Automated discovery of patterns and relationships in data
  6. OLAP Tools: Online Analytical Processing for multidimensional data analysis

Business Intelligence Benefits

Operational Data vs Decision Support Data

Operational Data

Data used in day-to-day operations:

Decision Support Data

Data optimized for analysis and decision making:

The Data Warehouse

A data warehouse is a specialized database that stores data in a format optimized for decision support. It contains historical data obtained from operational databases as well as data from other external sources.

Characteristics of Data Warehouses

Twelve Rules That Define a Data Warehouse

  1. Data warehouse and operational environments are separated
  2. Data warehouse data is integrated
  3. Data warehouse contains historical data
  4. Data warehouse data is snapshot data captured at a point in time
  5. Data warehouse data is subject-oriented
  6. Data warehouse data is primarily read-only
  7. Data warehouse development life cycle differs from operational systems
  8. Data warehouse contains data with different levels of detail
  9. Data warehouse environment is characterized by read-only transactions
  10. Data warehouse environment has system of records
  11. Data warehouse environment has metadata
  12. Data warehouse contains a chargeback mechanism

Data Marts

A data mart is a small, single-subject data warehouse subset that provides decision support to a small group of people. Data marts can be:

Star Schemas

A star schema is a data modeling technique used in data warehouses. It consists of:

Facts

Facts are numeric measurements (values) that represent a specific business aspect or activity. Examples include:

Dimensions

Dimensions are qualifying characteristics that provide additional perspectives to a given fact. Examples include:

Attributes

Attributes are characteristics of a dimension. For example, in a Time dimension, attributes might include:

Attribute Hierarchies

Attribute hierarchies provide top-down data organization. For example:

Attribute Hierarchy Example
Time Dimension:
Year
  └─ Quarter
      └─ Month
          └─ Week
              └─ Day

Location Dimension:
Country
  └─ Region
      └─ State
          └─ City

Star Schema Representation

In a star schema, facts are placed in a central fact table, and dimensions are placed in dimension tables. The fact table is typically very large, while dimension tables are smaller.

Star Schema Structure
Fact Table (SALES):
- Time_ID (FK)
- Product_ID (FK)
- Location_ID (FK)
- Customer_ID (FK)
- Sales_Amount
- Quantity_Sold
- Cost

Dimension Tables:
- TIME (Time_ID, Date, Month, Quarter, Year)
- PRODUCT (Product_ID, Product_Name, Category)
- LOCATION (Location_ID, City, State, Region)
- CUSTOMER (Customer_ID, Customer_Name, Segment)

Online Analytical Processing (OLAP)

Online Analytical Processing (OLAP) is a set of tools that provide advanced data analysis for retrieving, processing, and modeling data from the data warehouse.

Multidimensional Data Analysis Techniques

OLAP Architecture

OLAP systems can be implemented using different architectures:

Relational OLAP (ROLAP)

Uses relational databases to store multidimensional data. Data is stored in star schema structures. Advantages include scalability and ability to handle large data volumes.

Multidimensional OLAP (MOLAP)

Uses multidimensional data structures (cubes) to store data. Advantages include fast query performance and intuitive data representation.

Hybrid OLAP (HOLAP)

Combines ROLAP and MOLAP approaches, using the best features of both.

ROLAP vs MOLAP

Aspect ROLAP MOLAP
Storage Relational database Multidimensional cube
Scalability High Limited
Query Performance Good Excellent
Data Volume Very large Small to medium

SQL Extensions for OLAP

ROLLUP Extension

The ROLLUP extension generates subtotals and grand totals:

ROLLUP Example
SELECT Region, State, SUM(Sales_Amount) AS Total_Sales
FROM SALES
GROUP BY ROLLUP(Region, State);

-- Generates:
-- - Sales by Region and State
-- - Sales by Region (subtotal)
-- - Grand total

CUBE Extension

The CUBE extension generates all possible combinations of grouping columns:

CUBE Example
SELECT Product, Region, SUM(Sales_Amount) AS Total_Sales
FROM SALES
GROUP BY CUBE(Product, Region);

-- Generates all combinations:
-- - Sales by Product and Region
-- - Sales by Product
-- - Sales by Region
-- - Grand total

Materialized Views

Materialized views store pre-computed query results to improve query performance:

Materialized View Example
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT 
    YEAR(Sale_Date) AS Year,
    MONTH(Sale_Date) AS Month,
    SUM(Sales_Amount) AS Total_Sales
FROM SALES
GROUP BY YEAR(Sale_Date), MONTH(Sale_Date);

-- Query uses pre-computed results
SELECT * FROM monthly_sales;

Best Practices

Next Steps

Explore modern data challenges with Big Data & NoSQL, or learn about connecting databases to applications with Database Connectivity.