GROUP BY

Overview

The GROUP BY clause groups rows that have the same values in specified columns into summary rows. It's used with aggregate functions (COUNT, SUM, AVG, MIN, MAX) to perform calculations on each group.

GROUP BY is essential for creating summary reports, analyzing data by categories, and generating statistics for groups of records.

Basic Syntax

GROUP BY Syntax
SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1;

How GROUP BY Works

GROUP BY divides the result set into groups of rows that have matching values in the specified column(s), then aggregate functions are applied to each group:

GROUP BY Process
Original Data:
┌──────────┬───────┐
│ category │ price │
├──────────┼───────┤
│ Electronics │ 100 │
│ Clothing    │ 50  │
│ Electronics │ 200 │
│ Books       │ 20  │
│ Electronics │ 150 │
└──────────┴───────┘

GROUP BY category:
- Electronics group: 100, 200, 150
- Clothing group: 50
- Books group: 20

Then apply AVG:
- Electronics: 150
- Clothing: 50
- Books: 20

Examples

Example 1: Count by Category

Products per category
SELECT 
    category,
    COUNT(*) AS product_count
FROM Products
GROUP BY category;

Example 2: Multiple Aggregates

Category statistics
SELECT 
    category,
    COUNT(*) AS product_count,
    AVG(price) AS average_price,
    MIN(price) AS min_price,
    MAX(price) AS max_price,
    SUM(stock_quantity) AS total_stock
FROM Products
GROUP BY category;

Example 3: GROUP BY Multiple Columns

Group by multiple columns
SELECT 
    category,
    supplier_id,
    COUNT(*) AS product_count
FROM Products
GROUP BY category, supplier_id;

Example 4: GROUP BY with WHERE

Filter before grouping
SELECT 
    category,
    COUNT(*) AS product_count
FROM Products
WHERE price > 100
GROUP BY category;

Example 5: GROUP BY with JOIN

Group joined tables
SELECT 
    c.name AS customer_name,
    COUNT(o.order_id) AS order_count,
    SUM(o.total_amount) AS total_spent
FROM Customers c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;

GROUP BY Rules

Important: When using GROUP BY, all non-aggregated columns in SELECT must appear in GROUP BY:

Correct GROUP BY
-- ✅ CORRECT: All non-aggregated columns in GROUP BY
SELECT 
    category,
    supplier_id,
    COUNT(*) AS product_count
FROM Products
GROUP BY category, supplier_id;
Incorrect GROUP BY
-- ❌ ERROR: supplier_id not in GROUP BY
SELECT 
    category,
    supplier_id,
    COUNT(*) AS product_count
FROM Products
GROUP BY category;  -- Error!

GROUP BY with ORDER BY

Sort grouped results:

Sort groups
SELECT 
    category,
    COUNT(*) AS product_count
FROM Products
GROUP BY category
ORDER BY product_count DESC;

GROUP BY vs DISTINCT

Understanding when to use each:

Comparison
-- DISTINCT: Just unique values
SELECT DISTINCT category FROM Products;

-- GROUP BY: Unique values + aggregates
SELECT category, COUNT(*) FROM Products GROUP BY category;

-- GROUP BY without aggregates = similar to DISTINCT
SELECT category FROM Products GROUP BY category;

Common Patterns

Pattern 1: Customer Order Summary

Sales summary by customer
SELECT 
    customer_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent,
    AVG(total_amount) AS average_order
FROM Orders
GROUP BY customer_id;

Pattern 2: Daily Sales Report

Sales by date
SELECT 
    DATE(order_date) AS order_day,
    COUNT(*) AS order_count,
    SUM(total_amount) AS daily_revenue
FROM Orders
GROUP BY DATE(order_date)
ORDER BY order_day;

Performance Considerations

Best Practices

Common Mistakes

Next Steps

Learn about filtering groups with HAVING clause, or explore conditional expressions with CASE.