Database Indexes
Overview
Database Indexes are data structures that improve the speed of data retrieval operations on a database table. An index provides a fast way to look up data based on column values without scanning the entire table.
Indexes are crucial for database performance, especially on large tables. However, they come with trade-offs: improved read performance at the cost of additional storage and slower write operations.
What is an Index?
An index is a database object that contains an ordered list of keys (column values) and pointers to the corresponding rows in the table. Think of it like an index in a book - it helps you find information quickly without reading every page.
Without Index (Full Table Scan):
Search for StudentID = 100:
- Check row 1: No
- Check row 2: No
- Check row 3: No
- ... Check all 1,000,000 rows
- Time: O(n) - Slow!
With Index (Index Lookup):
Search for StudentID = 100:
- Look in index: Found at position 50,234
- Go directly to row 50,234
- Time: O(log n) - Fast!
Types of Indexes
1. Primary Index
Automatically created when you define a PRIMARY KEY. Data is physically ordered by the primary key.
2. Secondary Index
Indexes on non-primary key columns. Created explicitly to improve query performance on other columns.
3. Clustered Index
Determines the physical order of data in the table. A table can have only one clustered index (typically the primary key).
Physical Storage (Clustered by StudentID):
Row 1: StudentID = 101, Name = "Alice"
Row 2: StudentID = 102, Name = "Bob"
Row 3: StudentID = 103, Name = "Charlie"
- Data physically sorted by StudentID
- Fast for range queries on StudentID
4. Non-Clustered Index
Creates a separate structure pointing to data rows. A table can have multiple non-clustered indexes.
Index Structure (Non-clustered on Email):
Email Index:
alice@example.com → Row Pointer → Physical Row
bob@example.com → Row Pointer → Physical Row
charlie@example.com → Row Pointer → Physical Row
- Separate structure from data
- Multiple indexes possible
- Requires additional lookup
5. Composite Index
Index on multiple columns. Order of columns matters for performance.
CREATE INDEX idx_name_category
ON Products (Category, ProductName);
-- Helps with queries like:
SELECT * FROM Products
WHERE Category = 'Electronics' AND ProductName LIKE 'Laptop%';
-- Index order matters:
- Category first (more selective)
- ProductName second
6. Unique Index
Enforces uniqueness of values in indexed columns.
7. Full-Text Index
Special index for searching text content efficiently.
When to Create Indexes
Create indexes on columns that are:
- Frequently queried: Columns in WHERE clauses
- Used in JOINs: Foreign keys and join columns
- Used for sorting: Columns in ORDER BY
- Used for grouping: Columns in GROUP BY
- Primary keys: Automatically indexed
- Foreign keys: Should be indexed for JOIN performance
Index Structure
Most databases use B-tree (Balanced Tree) indexes:
B-tree Index (Simplified):
[50]
/ \
[20, 30] [70, 90]
/ | \ / | \
[10][25][40] [60][80][100]
Properties:
- Self-balancing tree
- Logarithmic search time
- Efficient for range queries
- Supports sorted traversal
Index Selection Strategy
1. Identify Query Patterns
Analyze which queries are most common and slow:
Common Query:
SELECT * FROM Customers WHERE Email = 'user@example.com';
Solution:
CREATE INDEX idx_email ON Customers(Email);
Result:
- Fast lookup by email
- Eliminates full table scan
2. Consider Selectivity
Selectivity measures how unique values are. High selectivity (unique values) benefits more from indexing.
- High Selectivity: Many unique values (e.g., Email, ID) - Great for indexes
- Low Selectivity: Few unique values (e.g., Gender, Status) - Less beneficial
3. Composite Index Order
In composite indexes, put the most selective column first:
-- Good: Most selective first
CREATE INDEX idx_category_name
ON Products (Category, ProductName);
-- Why?
- Category filters to smaller set first
- Then ProductName refines within category
-- Bad: Less selective first
CREATE INDEX idx_name_category
ON Products (ProductName, Category);
- ProductName less selective
- Index less efficient
Index Trade-offs
Understanding the costs and benefits:
| Aspect | Impact |
|---|---|
| Read Performance | ✓ Significantly improved |
| Write Performance | ✗ Slightly slower (must update index) |
| Storage Space | ✗ Additional space required |
| Maintenance | ✗ Index must be maintained on updates |
Best Practices
- Index foreign keys: Critical for JOIN performance
- Index frequently filtered columns: WHERE clause columns
- Don't over-index: Too many indexes slow writes
- Monitor index usage: Remove unused indexes
- Rebuild periodically: Fragmented indexes degrade performance
- Consider covering indexes: Include all needed columns
Index Maintenance
Indexes require maintenance to stay efficient:
- Rebuilding: Recreate index structure (improves fragmentation)
- Reorganizing: Defragment index (lighter operation)
- Updating statistics: Help query optimizer choose best index
- Monitoring usage: Identify unused indexes
Common Mistakes
- Too many indexes: Slows down INSERT/UPDATE/DELETE
- Indexing low-selectivity columns: Little performance benefit
- Forgetting to index foreign keys: Poor JOIN performance
- Wrong composite index order: Less efficient queries
- Not monitoring index usage: Keeping unused indexes
Next Steps
Learn about managing data changes with Transactions, or explore handling multiple users with Concurrency Control.