Database Indexes

Overview

Database Indexes are data structures that improve the speed of data retrieval operations on a database table. An index provides a fast way to look up data based on column values without scanning the entire table.

Indexes are crucial for database performance, especially on large tables. However, they come with trade-offs: improved read performance at the cost of additional storage and slower write operations.

What is an Index?

An index is a database object that contains an ordered list of keys (column values) and pointers to the corresponding rows in the table. Think of it like an index in a book - it helps you find information quickly without reading every page.

Index Concept
Without Index (Full Table Scan):
Search for StudentID = 100:
- Check row 1: No
- Check row 2: No
- Check row 3: No
- ... Check all 1,000,000 rows
- Time: O(n) - Slow!

With Index (Index Lookup):
Search for StudentID = 100:
- Look in index: Found at position 50,234
- Go directly to row 50,234
- Time: O(log n) - Fast!

Types of Indexes

1. Primary Index

Automatically created when you define a PRIMARY KEY. Data is physically ordered by the primary key.

2. Secondary Index

Indexes on non-primary key columns. Created explicitly to improve query performance on other columns.

3. Clustered Index

Determines the physical order of data in the table. A table can have only one clustered index (typically the primary key).

Clustered Index
Physical Storage (Clustered by StudentID):
Row 1: StudentID = 101, Name = "Alice"
Row 2: StudentID = 102, Name = "Bob"
Row 3: StudentID = 103, Name = "Charlie"
- Data physically sorted by StudentID
- Fast for range queries on StudentID

4. Non-Clustered Index

Creates a separate structure pointing to data rows. A table can have multiple non-clustered indexes.

Non-Clustered Index
Index Structure (Non-clustered on Email):
Email Index:
  alice@example.com → Row Pointer → Physical Row
  bob@example.com   → Row Pointer → Physical Row
  charlie@example.com → Row Pointer → Physical Row

- Separate structure from data
- Multiple indexes possible
- Requires additional lookup

5. Composite Index

Index on multiple columns. Order of columns matters for performance.

Composite Index Example
CREATE INDEX idx_name_category
ON Products (Category, ProductName);

-- Helps with queries like:
SELECT * FROM Products
WHERE Category = 'Electronics' AND ProductName LIKE 'Laptop%';

-- Index order matters:
- Category first (more selective)
- ProductName second

6. Unique Index

Enforces uniqueness of values in indexed columns.

7. Full-Text Index

Special index for searching text content efficiently.

When to Create Indexes

Create indexes on columns that are:

Index Structure

Most databases use B-tree (Balanced Tree) indexes:

B-tree Index Structure
B-tree Index (Simplified):
                    [50]
                  /      \
            [20, 30]     [70, 90]
           /   |   \    /   |   \
        [10][25][40]  [60][80][100]

Properties:
- Self-balancing tree
- Logarithmic search time
- Efficient for range queries
- Supports sorted traversal

Index Selection Strategy

1. Identify Query Patterns

Analyze which queries are most common and slow:

Query Analysis
Common Query:
SELECT * FROM Customers WHERE Email = 'user@example.com';

Solution:
CREATE INDEX idx_email ON Customers(Email);

Result:
- Fast lookup by email
- Eliminates full table scan

2. Consider Selectivity

Selectivity measures how unique values are. High selectivity (unique values) benefits more from indexing.

3. Composite Index Order

In composite indexes, put the most selective column first:

Composite Index Order
-- Good: Most selective first
CREATE INDEX idx_category_name
ON Products (Category, ProductName);

-- Why?
- Category filters to smaller set first
- Then ProductName refines within category

-- Bad: Less selective first
CREATE INDEX idx_name_category
ON Products (ProductName, Category);
- ProductName less selective
- Index less efficient

Index Trade-offs

Understanding the costs and benefits:

Aspect Impact
Read Performance ✓ Significantly improved
Write Performance ✗ Slightly slower (must update index)
Storage Space ✗ Additional space required
Maintenance ✗ Index must be maintained on updates

Best Practices

Index Maintenance

Indexes require maintenance to stay efficient:

Common Mistakes

Next Steps

Learn about managing data changes with Transactions, or explore handling multiple users with Concurrency Control.