Home / Database / Database Indexes

Database Indexes

Overview

Database Indexes are data structures that improve the speed of data retrieval operations on a database table. An index provides a fast way to look up data based on column values without scanning the entire table.

Indexes are crucial for database performance, especially on large tables. However, they come with trade-offs: improved read performance at the cost of additional storage and slower write operations.

What is an Index?

An index is a database object that contains an ordered list of keys (column values) and pointers to the corresponding rows in the table. Think of it like an index in a book - it helps you find information quickly without reading every page.

Index Concept

Without Index (Full Table Scan):
Search for StudentID = 100:
- Check row 1: No
- Check row 2: No
- Check row 3: No
- ... Check all 1,000,000 rows
- Time: O(n) - Slow!

With Index (Index Lookup):
Search for StudentID = 100:
- Look in index: Found at position 50,234
- Go directly to row 50,234
- Time: O(log n) - Fast!

Types of Indexes

1. Primary Index

Automatically created when you define a PRIMARY KEY. Data is physically ordered by the primary key.

2. Secondary Index

Indexes on non-primary key columns. Created explicitly to improve query performance on other columns.

3. Clustered Index

Determines the physical order of data in the table. A table can have only one clustered index (typically the primary key).

Clustered Index

Physical Storage (Clustered by StudentID):
Row 1: StudentID = 101, Name = "Alice"
Row 2: StudentID = 102, Name = "Bob"
Row 3: StudentID = 103, Name = "Charlie"
- Data physically sorted by StudentID
- Fast for range queries on StudentID

4. Non-Clustered Index

Creates a separate structure pointing to data rows. A table can have multiple non-clustered indexes.

Non-Clustered Index

Index Structure (Non-clustered on Email):
Email Index:
  alice@example.com → Row Pointer → Physical Row
  bob@example.com   → Row Pointer → Physical Row
  charlie@example.com → Row Pointer → Physical Row

- Separate structure from data
- Multiple indexes possible
- Requires additional lookup

5. Composite Index

Index on multiple columns. Order of columns matters for performance.

Composite Index Example

CREATE INDEX idx_name_category
ON Products (Category, ProductName);

-- Helps with queries like:
SELECT * FROM Products
WHERE Category = 'Electronics' AND ProductName LIKE 'Laptop%';

-- Index order matters:
- Category first (more selective)
- ProductName second

6. Unique Index

Enforces uniqueness of values in indexed columns.

7. Full-Text Index

Special index for searching text content efficiently.

When to Create Indexes

Create indexes on columns that are:

Frequently queried: Columns in WHERE clauses
Used in JOINs: Foreign keys and join columns
Used for sorting: Columns in ORDER BY
Used for grouping: Columns in GROUP BY
Primary keys: Automatically indexed
Foreign keys: Should be indexed for JOIN performance

Index Structure

Most databases use B-tree (Balanced Tree) indexes:

B-tree Index Structure

B-tree Index (Simplified):
                    [50]
                  /      \
            [20, 30]     [70, 90]
           /   |   \    /   |   \
        [10][25][40]  [60][80][100]

Properties:
- Self-balancing tree
- Logarithmic search time
- Efficient for range queries
- Supports sorted traversal

Index Selection Strategy

1. Identify Query Patterns

Analyze which queries are most common and slow:

Query Analysis

Common Query:
SELECT * FROM Customers WHERE Email = 'user@example.com';

Solution:
CREATE INDEX idx_email ON Customers(Email);

Result:
- Fast lookup by email
- Eliminates full table scan

2. Consider Selectivity

Selectivity measures how unique values are. High selectivity (unique values) benefits more from indexing.

High Selectivity: Many unique values (e.g., Email, ID) - Great for indexes
Low Selectivity: Few unique values (e.g., Gender, Status) - Less beneficial

3. Composite Index Order

In composite indexes, put the most selective column first:

Composite Index Order

-- Good: Most selective first
CREATE INDEX idx_category_name
ON Products (Category, ProductName);

-- Why?
- Category filters to smaller set first
- Then ProductName refines within category

-- Bad: Less selective first
CREATE INDEX idx_name_category
ON Products (ProductName, Category);
- ProductName less selective
- Index less efficient

Index Trade-offs

Understanding the costs and benefits:

Aspect	Impact
Read Performance	✓ Significantly improved
Write Performance	✗ Slightly slower (must update index)
Storage Space	✗ Additional space required
Maintenance	✗ Index must be maintained on updates

Best Practices

Index foreign keys: Critical for JOIN performance
Index frequently filtered columns: WHERE clause columns
Don't over-index: Too many indexes slow writes
Monitor index usage: Remove unused indexes
Rebuild periodically: Fragmented indexes degrade performance
Consider covering indexes: Include all needed columns

Index Maintenance

Indexes require maintenance to stay efficient:

Rebuilding: Recreate index structure (improves fragmentation)
Reorganizing: Defragment index (lighter operation)
Updating statistics: Help query optimizer choose best index
Monitoring usage: Identify unused indexes

Common Mistakes

Too many indexes: Slows down INSERT/UPDATE/DELETE
Indexing low-selectivity columns: Little performance benefit
Forgetting to index foreign keys: Poor JOIN performance
Wrong composite index order: Less efficient queries
Not monitoring index usage: Keeping unused indexes

Next Steps

Learn about managing data changes with Transactions, or explore handling multiple users with Concurrency Control.

← Constraints Transactions →