GROUP BY
Overview
The GROUP BY clause groups rows that have the same values in specified columns into summary rows. It's used with aggregate functions (COUNT, SUM, AVG, MIN, MAX) to perform calculations on each group.
GROUP BY is essential for creating summary reports, analyzing data by categories, and generating statistics for groups of records.
Basic Syntax
GROUP BY Syntax
SELECT column1, aggregate_function(column2)
FROM table_name
WHERE condition
GROUP BY column1;
How GROUP BY Works
GROUP BY divides the result set into groups of rows that have matching values in the specified column(s), then aggregate functions are applied to each group:
GROUP BY Process
Original Data:
┌──────────┬───────┐
│ category │ price │
├──────────┼───────┤
│ Electronics │ 100 │
│ Clothing │ 50 │
│ Electronics │ 200 │
│ Books │ 20 │
│ Electronics │ 150 │
└──────────┴───────┘
GROUP BY category:
- Electronics group: 100, 200, 150
- Clothing group: 50
- Books group: 20
Then apply AVG:
- Electronics: 150
- Clothing: 50
- Books: 20
Examples
Example 1: Count by Category
Products per category
SELECT
category,
COUNT(*) AS product_count
FROM Products
GROUP BY category;
Example 2: Multiple Aggregates
Category statistics
SELECT
category,
COUNT(*) AS product_count,
AVG(price) AS average_price,
MIN(price) AS min_price,
MAX(price) AS max_price,
SUM(stock_quantity) AS total_stock
FROM Products
GROUP BY category;
Example 3: GROUP BY Multiple Columns
Group by multiple columns
SELECT
category,
supplier_id,
COUNT(*) AS product_count
FROM Products
GROUP BY category, supplier_id;
Example 4: GROUP BY with WHERE
Filter before grouping
SELECT
category,
COUNT(*) AS product_count
FROM Products
WHERE price > 100
GROUP BY category;
Example 5: GROUP BY with JOIN
Group joined tables
SELECT
c.name AS customer_name,
COUNT(o.order_id) AS order_count,
SUM(o.total_amount) AS total_spent
FROM Customers c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;
GROUP BY Rules
Important: When using GROUP BY, all non-aggregated columns in SELECT must appear in GROUP BY:
Correct GROUP BY
-- ✅ CORRECT: All non-aggregated columns in GROUP BY
SELECT
category,
supplier_id,
COUNT(*) AS product_count
FROM Products
GROUP BY category, supplier_id;
Incorrect GROUP BY
-- ❌ ERROR: supplier_id not in GROUP BY
SELECT
category,
supplier_id,
COUNT(*) AS product_count
FROM Products
GROUP BY category; -- Error!
GROUP BY with ORDER BY
Sort grouped results:
Sort groups
SELECT
category,
COUNT(*) AS product_count
FROM Products
GROUP BY category
ORDER BY product_count DESC;
GROUP BY vs DISTINCT
Understanding when to use each:
Comparison
-- DISTINCT: Just unique values
SELECT DISTINCT category FROM Products;
-- GROUP BY: Unique values + aggregates
SELECT category, COUNT(*) FROM Products GROUP BY category;
-- GROUP BY without aggregates = similar to DISTINCT
SELECT category FROM Products GROUP BY category;
Common Patterns
Pattern 1: Customer Order Summary
Sales summary by customer
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS total_spent,
AVG(total_amount) AS average_order
FROM Orders
GROUP BY customer_id;
Pattern 2: Daily Sales Report
Sales by date
SELECT
DATE(order_date) AS order_day,
COUNT(*) AS order_count,
SUM(total_amount) AS daily_revenue
FROM Orders
GROUP BY DATE(order_date)
ORDER BY order_day;
Performance Considerations
- Index GROUP BY columns: Significantly improves performance
- Filter first: Use WHERE to reduce rows before grouping
- Limit groups: Use HAVING or WHERE to limit results
- Avoid unnecessary grouping: Only group when you need aggregates
Best Practices
- Include all non-aggregates in GROUP BY: Required by SQL standard
- Use meaningful aliases: Name aggregate columns descriptively
- Index grouping columns: Critical for performance
- Combine with HAVING: Filter groups after aggregation
- Test logic: Verify grouping produces expected results
Common Mistakes
- Missing columns in GROUP BY: Must include all non-aggregated columns
- Using WHERE for groups: Use HAVING for group conditions
- Forgetting aggregates: GROUP BY without aggregates may not be needed
- Performance issues: Not indexing GROUP BY columns
Next Steps
Learn about filtering groups with HAVING clause, or explore conditional expressions with CASE.