SELECT DISTINCT
Overview
The SELECT DISTINCT statement is used to return only distinct (unique) values. It eliminates duplicate rows from the result set.
DISTINCT is useful when you want to see unique values in a column or combination of columns, without duplicate entries.
Basic Syntax
The DISTINCT keyword is placed immediately after SELECT:
SELECT DISTINCT column1, column2, ...
FROM table_name;
Simple Example
Let's see how DISTINCT works with sample data:
Orders Table:
┌──────────┬─────────────┬────────────┐
│ order_id │ customer_id │ city │
├──────────┼─────────────┼────────────┤
│ 1 │ 101 │ New York │
│ 2 │ 102 │ Los Angeles│
│ 3 │ 103 │ New York │
│ 4 │ 104 │ Chicago │
│ 5 │ 105 │ New York │
└──────────┴─────────────┴────────────┘
Without DISTINCT
SELECT city FROM Orders;
┌────────────┐
│ city │
├────────────┤
│ New York │
│ Los Angeles│
│ New York │
│ Chicago │
│ New York │
└────────────┘
With DISTINCT
SELECT DISTINCT city FROM Orders;
┌────────────┐
│ city │
├────────────┤
│ New York │
│ Los Angeles│
│ Chicago │
└────────────┘
DISTINCT with Multiple Columns
When using DISTINCT with multiple columns, it returns unique combinations of those columns:
SELECT DISTINCT city, state
FROM Customers;
This returns unique combinations of city and state. For example:
┌────────────┬───────────┐
│ city │ state │
├────────────┼───────────┤
│ New York │ NY │
│ Los Angeles│ CA │
│ Chicago │ IL │
│ New York │ NY │ ← Only shown once (duplicate)
└────────────┴───────────┘
DISTINCT vs GROUP BY
DISTINCT and GROUP BY can produce similar results, but they serve different purposes:
SELECT DISTINCT category
FROM Products;
SELECT category, COUNT(*) AS product_count
FROM Products
GROUP BY category;
Common Examples
Example 1: Unique Customer Cities
SELECT DISTINCT city
FROM Customers
ORDER BY city;
Example 2: Unique Product Categories
SELECT DISTINCT category
FROM Products
WHERE active = 1;
Example 3: Unique Combinations
SELECT DISTINCT customer_id, product_id
FROM Orders;
DISTINCT with WHERE
You can combine DISTINCT with WHERE to filter before finding unique values:
SELECT DISTINCT city
FROM Customers
WHERE country = 'USA'
ORDER BY city;
DISTINCT with COUNT
You can count distinct values using COUNT(DISTINCT column):
-- Count how many unique cities exist
SELECT COUNT(DISTINCT city) AS unique_cities
FROM Customers;
-- Count unique customers per city
SELECT city, COUNT(DISTINCT customer_id) AS unique_customers
FROM Orders
GROUP BY city;
Performance Considerations
- DISTINCT can be slow: The database must sort or hash data to find duplicates
- Use indexes: Indexes on the selected columns can improve DISTINCT performance
- Consider GROUP BY: For aggregations, GROUP BY might be more efficient
- Avoid unnecessary DISTINCT: Don't use DISTINCT if duplicates aren't possible
When to Use DISTINCT
- Getting a list of unique values from a column
- Creating dropdown lists or filters in applications
- Data exploration to understand unique values
- Finding unique combinations of columns
- Eliminating accidental duplicates from queries
When NOT to Use DISTINCT
- When you need all rows, including duplicates
- When duplicates aren't possible (e.g., selecting primary keys)
- As a quick fix for duplicate data issues (fix the root cause instead)
- When you need aggregate functions (use GROUP BY instead)
Best Practices
- Use sparingly: DISTINCT has performance overhead
- Specify columns: Use DISTINCT only on columns that need uniqueness
- Combine with WHERE: Filter first to reduce data before DISTINCT
- Index considerations: Ensure proper indexes for better performance
- Document usage: If using DISTINCT to fix issues, document why
Next Steps
Continue learning with WHERE clause for filtering, or explore GROUP BY for grouping and aggregation.