Analyze query patterns and recommend optimal database indexes
Analyzes query workloads to identify missing indexes, detect unused indexes, and recommend optimal indexing strategies for database performance.
/plugin marketplace add jeremylongshore/claude-code-plugins-plus-skills/plugin install database-migration-manager@claude-code-plugins-plusAnalyze query workloads, identify missing indexes, detect unused indexes, and recommend optimal indexing strategies with automated index impact analysis and maintenance scheduling for production databases.
Use /index-advisor when you need to:
DON'T use this when:
This command implements workload-based index analysis because:
Alternative considered: Static schema analysis
Alternative considered: Manual EXPLAIN analysis
Before running this command:
Capture real production queries from logs or pg_stat_statements.
Run EXPLAIN ANALYZE to identify sequential scans and suboptimal query plans.
Identify missing indexes, composite index opportunities, and covering indexes.
Estimate query performance improvements with hypothetical indexes.
Create recommended indexes and track query performance improvements.
The command generates:
analysis/missing_indexes.sql - CREATE INDEX statements for missing indexesanalysis/unused_indexes.sql - DROP INDEX statements for unused indexesreports/index_impact_report.html - Visual impact analysis with before/after metricsmonitoring/index_health.sql - Queries to monitor index bloat and usagemaintenance/reindex_schedule.sh - Automated index maintenance script-- Enable pg_stat_statements extension for query tracking
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Configure extended statistics
ALTER SYSTEM SET pg_stat_statements.track = 'all';
ALTER SYSTEM SET pg_stat_statements.max = 10000;
SELECT pg_reload_conf();
-- View most expensive queries without proper indexes
CREATE OR REPLACE VIEW slow_queries_needing_indexes AS
SELECT
queryid,
LEFT(query, 100) AS query_snippet,
calls,
total_exec_time,
mean_exec_time,
max_exec_time,
stddev_exec_time,
ROUND((100.0 * total_exec_time / SUM(total_exec_time) OVER ()), 2) AS pct_total_time
FROM pg_stat_statements
WHERE query NOT LIKE '%pg_stat%'
AND mean_exec_time > 100 -- Queries averaging >100ms
ORDER BY total_exec_time DESC
LIMIT 50;
-- Identify missing indexes by analyzing sequential scans
CREATE OR REPLACE VIEW tables_needing_indexes AS
SELECT
schemaname,
tablename,
seq_scan AS sequential_scans,
seq_tup_read AS rows_read_sequentially,
idx_scan AS index_scans,
idx_tup_fetch AS rows_fetched_via_index,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
CASE
WHEN seq_scan > 0 THEN ROUND(100.0 * seq_scan / (seq_scan + COALESCE(idx_scan, 0)), 2)
ELSE 0
END AS pct_sequential_scans
FROM pg_stat_user_tables
WHERE seq_scan > 1000 -- Tables with >1000 sequential scans
AND seq_tup_read > 10000 -- Reading >10k rows sequentially
ORDER BY seq_tup_read DESC;
-- Detect unused indexes consuming storage
CREATE OR REPLACE VIEW unused_indexes AS
SELECT
schemaname,
tablename,
indexname,
idx_scan AS index_scans,
idx_tup_read AS tuples_read,
idx_tup_fetch AS tuples_fetched,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_get_indexdef(indexrelid) AS index_definition
FROM pg_stat_user_indexes
WHERE idx_scan = 0 -- Never used
AND indexname NOT LIKE '%_pkey' -- Exclude primary keys
AND indexname NOT LIKE '%_unique%' -- Exclude unique constraints
ORDER BY pg_relation_size(indexrelid) DESC;
-- Analyze index bloat and recommend REINDEX
CREATE OR REPLACE VIEW index_bloat_analysis AS
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_size_pretty(pg_relation_size(tablename::regclass)) AS table_size,
ROUND(100.0 * pg_relation_size(indexrelid) / NULLIF(pg_relation_size(tablename::regclass), 0), 2) AS index_to_table_ratio,
CASE
WHEN pg_relation_size(indexrelid) > pg_relation_size(tablename::regclass) * 0.3
THEN 'High bloat - consider REINDEX'
WHEN pg_relation_size(indexrelid) > pg_relation_size(tablename::regclass) * 0.15
THEN 'Moderate bloat - monitor'
ELSE 'Healthy'
END AS bloat_status
FROM pg_stat_user_indexes
WHERE pg_relation_size(indexrelid) > 100 * 1024 * 1024 -- >100MB
ORDER BY pg_relation_size(indexrelid) DESC;
# scripts/index_advisor.py - Comprehensive Index Analysis Tool
import psycopg2
from psycopg2.extras import DictCursor
import re
import logging
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass, asdict
from collections import defaultdict
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@dataclass
class IndexRecommendation:
"""Represents an index recommendation with impact analysis."""
table_name: str
recommended_index: str
reason: str
affected_queries: List[str]
estimated_speedup: str
storage_cost_mb: float
priority: str # 'high', 'medium', 'low'
def to_dict(self) -> dict:
return asdict(self)
class PostgreSQLIndexAdvisor:
"""Analyze queries and recommend optimal indexes."""
def __init__(self, connection_string: str):
self.conn_string = connection_string
def connect(self):
return psycopg2.connect(self.conn_string, cursor_factory=DictCursor)
def analyze_slow_queries(self, min_duration_ms: int = 100) -> List[Dict]:
"""Identify slow queries from pg_stat_statements."""
conn = self.connect()
try:
with conn.cursor() as cur:
cur.execute("""
SELECT
queryid,
LEFT(query, 200) AS query,
calls,
ROUND(total_exec_time::numeric, 2) AS total_time_ms,
ROUND(mean_exec_time::numeric, 2) AS mean_time_ms,
ROUND(max_exec_time::numeric, 2) AS max_time_ms
FROM pg_stat_statements
WHERE mean_exec_time > %s
AND query NOT LIKE '%%pg_stat%%'
AND query NOT LIKE '%%information_schema%%'
ORDER BY total_exec_time DESC
LIMIT 100;
""", (min_duration_ms,))
return [dict(row) for row in cur.fetchall()]
finally:
conn.close()
def extract_where_columns(self, query: str) -> List[Tuple[str, str]]:
"""Extract table and column names from WHERE clauses."""
columns = []
# Pattern: WHERE table.column = value
where_pattern = r'WHERE\s+(\w+)\.(\w+)\s*[=<>]'
matches = re.finditer(where_pattern, query, re.IGNORECASE)
for match in matches:
table = match.group(1)
column = match.group(2)
columns.append((table, column))
# Pattern: JOIN table ON t1.col = t2.col
join_pattern = r'JOIN\s+(\w+)\s+\w+\s+ON\s+\w+\.(\w+)\s*=\s*\w+\.\w+'
matches = re.finditer(join_pattern, query, re.IGNORECASE)
for match in matches:
table = match.group(1)
column = match.group(2)
columns.append((table, column))
return columns
def check_existing_indexes(self, table: str, column: str) -> bool:
"""Check if index exists for table.column."""
conn = self.connect()
try:
with conn.cursor() as cur:
cur.execute("""
SELECT COUNT(*) > 0 AS has_index
FROM pg_indexes
WHERE tablename = %s
AND indexdef LIKE %s;
""", (table, f'%{column}%'))
result = cur.fetchone()
return result['has_index'] if result else False
finally:
conn.close()
def generate_recommendations(self) -> List[IndexRecommendation]:
"""Generate comprehensive index recommendations."""
recommendations = []
# Get slow queries
slow_queries = self.analyze_slow_queries()
logger.info(f"Analyzing {len(slow_queries)} slow queries...")
# Track columns needing indexes
column_usage = defaultdict(lambda: {'count': 0, 'queries': [], 'total_time': 0})
for query_info in slow_queries:
query = query_info['query']
total_time = float(query_info['total_time_ms'])
# Extract WHERE/JOIN columns
columns = self.extract_where_columns(query)
for table, column in columns:
# Check if index exists
if not self.check_existing_indexes(table, column):
key = f"{table}.{column}"
column_usage[key]['count'] += 1
column_usage[key]['queries'].append(query[:100])
column_usage[key]['total_time'] += total_time
# Generate recommendations
for key, usage in column_usage.items():
table, column = key.split('.')
# Estimate speedup based on query count and time
if usage['total_time'] > 10000: # >10 seconds total
priority = 'high'
speedup = '5-10x faster'
elif usage['total_time'] > 1000: # >1 second total
priority = 'medium'
speedup = '3-5x faster'
else:
priority = 'low'
speedup = '2-3x faster'
# Estimate storage cost (rough approximation)
storage_cost = self.estimate_index_size(table)
recommendation = IndexRecommendation(
table_name=table,
recommended_index=f"CREATE INDEX idx_{table}_{column} ON {table}({column});",
reason=f"Used in {usage['count']} slow queries totaling {usage['total_time']:.0f}ms",
affected_queries=usage['queries'][:5], # Top 5 queries
estimated_speedup=speedup,
storage_cost_mb=storage_cost,
priority=priority
)
recommendations.append(recommendation)
# Sort by priority and total time
recommendations.sort(
key=lambda r: (
{'high': 0, 'medium': 1, 'low': 2}[r.priority],
-sum(1 for _ in r.affected_queries)
)
)
return recommendations
def estimate_index_size(self, table: str) -> float:
"""Estimate index size in MB based on table size."""
conn = self.connect()
try:
with conn.cursor() as cur:
cur.execute("""
SELECT pg_relation_size(%s) / (1024.0 * 1024.0) AS size_mb
FROM pg_class
WHERE relname = %s;
""", (table, table))
result = cur.fetchone()
if result:
# Index typically 20-30% of table size
return round(result['size_mb'] * 0.25, 2)
return 10.0 # Default estimate
except Exception as e:
logger.warning(f"Could not estimate size for {table}: {e}")
return 10.0
finally:
conn.close()
def find_unused_indexes(self) -> List[Dict]:
"""Identify indexes that are never used."""
conn = self.connect()
try:
with conn.cursor() as cur:
cur.execute("""
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
pg_get_indexdef(indexrelid) AS definition
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE '%_pkey'
AND indexname NOT LIKE '%_unique%'
AND pg_relation_size(indexrelid) > 1024 * 1024 -- >1MB
ORDER BY pg_relation_size(indexrelid) DESC;
""")
return [dict(row) for row in cur.fetchall()]
finally:
conn.close()
def generate_report(self):
"""Generate comprehensive index analysis report."""
logger.info("=== Index Analysis Report ===\n")
# Recommendations
recommendations = self.generate_recommendations()
if recommendations:
logger.info(f"Found {len(recommendations)} index recommendations:\n")
for i, rec in enumerate(recommendations, 1):
logger.info(f"{i}. [{rec.priority.upper()}] {rec.table_name}")
logger.info(f" Recommendation: {rec.recommended_index}")
logger.info(f" Reason: {rec.reason}")
logger.info(f" Expected speedup: {rec.estimated_speedup}")
logger.info(f" Storage cost: ~{rec.storage_cost_mb}MB")
logger.info(f" Affected queries: {len(rec.affected_queries)}")
logger.info("")
# Unused indexes
unused = self.find_unused_indexes()
if unused:
logger.info(f"\n=== Unused Indexes ({len(unused)}) ===\n")
for idx in unused:
logger.info(f"DROP INDEX {idx['schemaname']}.{idx['indexname']};")
logger.info(f" -- Table: {idx['tablename']}, Size: {idx['index_size']}")
logger.info("\n=== Summary ===")
logger.info(f"Missing indexes: {len(recommendations)}")
logger.info(f"Unused indexes: {len(unused)}")
logger.info(f"Potential storage savings: {sum(self._parse_size(idx['index_size']) for idx in unused):.2f}MB")
def _parse_size(self, size_str: str) -> float:
"""Parse PostgreSQL pg_size_pretty output to MB."""
if 'GB' in size_str:
return float(size_str.replace(' GB', '').replace('GB', '')) * 1024
elif 'MB' in size_str:
return float(size_str.replace(' MB', '').replace('MB', ''))
elif 'KB' in size_str:
return float(size_str.replace(' KB', '').replace('KB', '')) / 1024
return 0.0
# Usage
if __name__ == "__main__":
advisor = PostgreSQLIndexAdvisor(
"postgresql://user:password@localhost:5432/mydb"
)
advisor.generate_report()
-- Enable performance schema for query analysis
UPDATE performance_schema.setup_instruments
SET ENABLED = 'YES', TIMED = 'YES'
WHERE NAME LIKE 'statement/%';
UPDATE performance_schema.setup_consumers
SET ENABLED = 'YES'
WHERE NAME LIKE '%statements%';
-- Identify slow queries needing indexes
CREATE OR REPLACE VIEW slow_queries_analysis AS
SELECT
DIGEST_TEXT AS query,
COUNT_STAR AS executions,
ROUND(AVG_TIMER_WAIT / 1000000000, 2) AS avg_time_ms,
ROUND(MAX_TIMER_WAIT / 1000000000, 2) AS max_time_ms,
ROUND(SUM_TIMER_WAIT / 1000000000, 2) AS total_time_ms,
SUM_ROWS_EXAMINED AS total_rows_examined,
SUM_ROWS_SENT AS total_rows_sent,
ROUND(SUM_ROWS_EXAMINED / COUNT_STAR, 0) AS avg_rows_examined
FROM performance_schema.events_statements_summary_by_digest
WHERE DIGEST_TEXT IS NOT NULL
AND SCHEMA_NAME NOT IN ('mysql', 'performance_schema', 'information_schema')
AND AVG_TIMER_WAIT > 100000000 -- >100ms average
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 50;
-- Find tables with full table scans
SELECT
object_schema AS database_name,
object_name AS table_name,
count_read AS select_count,
count_fetch AS rows_fetched,
ROUND(count_fetch / NULLIF(count_read, 0), 2) AS avg_rows_per_select
FROM performance_schema.table_io_waits_summary_by_table
WHERE object_schema NOT IN ('mysql', 'performance_schema', 'information_schema')
AND count_read > 1000
ORDER BY count_fetch DESC;
-- Identify duplicate indexes
SELECT
t1.TABLE_SCHEMA AS database_name,
t1.TABLE_NAME AS table_name,
t1.INDEX_NAME AS index1,
t2.INDEX_NAME AS index2,
GROUP_CONCAT(t1.COLUMN_NAME ORDER BY t1.SEQ_IN_INDEX) AS columns
FROM information_schema.STATISTICS t1
JOIN information_schema.STATISTICS t2
ON t1.TABLE_SCHEMA = t2.TABLE_SCHEMA
AND t1.TABLE_NAME = t2.TABLE_NAME
AND t1.INDEX_NAME < t2.INDEX_NAME
AND t1.COLUMN_NAME = t2.COLUMN_NAME
AND t1.SEQ_IN_INDEX = t2.SEQ_IN_INDEX
WHERE t1.TABLE_SCHEMA NOT IN ('mysql', 'performance_schema', 'information_schema')
GROUP BY t1.TABLE_SCHEMA, t1.TABLE_NAME, t1.INDEX_NAME, t2.INDEX_NAME
HAVING COUNT(*) = (
SELECT COUNT(*)
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = t1.TABLE_SCHEMA
AND TABLE_NAME = t1.TABLE_NAME
AND INDEX_NAME = t1.INDEX_NAME
);
// scripts/mysql-index-advisor.js
const mysql = require('mysql2/promise');
class MySQLIndexAdvisor {
constructor(config) {
this.pool = mysql.createPool({
host: config.host,
user: config.user,
password: config.password,
database: config.database,
waitForConnections: true,
connectionLimit: 10
});
}
async analyzeTableIndexes(tableName) {
const [rows] = await this.pool.query(`
SELECT
COLUMN_NAME,
CARDINALITY,
INDEX_NAME,
SEQ_IN_INDEX,
NON_UNIQUE
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME = ?
ORDER BY INDEX_NAME, SEQ_IN_INDEX
`, [tableName]);
return rows;
}
async findMissingIndexes() {
// Analyze slow queries from performance schema
const [slowQueries] = await this.pool.query(`
SELECT
DIGEST_TEXT AS query,
COUNT_STAR AS executions,
ROUND(AVG_TIMER_WAIT / 1000000000, 2) AS avg_time_ms,
SUM_ROWS_EXAMINED AS total_rows_examined
FROM performance_schema.events_statements_summary_by_digest
WHERE AVG_TIMER_WAIT > 100000000
AND SCHEMA_NAME = DATABASE()
ORDER BY AVG_TIMER_WAIT DESC
LIMIT 20
`);
const recommendations = [];
for (const query of slowQueries) {
// Extract WHERE conditions
const whereMatch = query.query.match(/WHERE\s+(\w+)\s*[=<>]/i);
if (whereMatch) {
const column = whereMatch[1];
recommendations.push({
table: 'unknown', // Extract from query
column: column,
query: query.query.substring(0, 100),
avgTime: query.avg_time_ms,
recommendation: `CREATE INDEX idx_${column} ON table_name(${column});`
});
}
}
return recommendations;
}
async generateReport() {
console.log('=== MySQL Index Analysis Report ===\n');
const recommendations = await this.findMissingIndexes();
console.log(`Found ${recommendations.length} potential index improvements:\n`);
recommendations.forEach((rec, i) => {
console.log(`${i + 1}. ${rec.recommendation}`);
console.log(` Average query time: ${rec.avgTime}ms`);
console.log(` Query: ${rec.query}...`);
console.log('');
});
}
}
// Usage
(async () => {
const advisor = new MySQLIndexAdvisor({
host: 'localhost',
user: 'root',
password: 'password',
database: 'mydb'
});
await advisor.generateReport();
})();
| Error | Cause | Solution |
|---|---|---|
| "Index too large" | Index exceeds max key length | Use partial index or hash index for long columns |
| "Duplicate key violation" | Creating unique index on non-unique data | Check for duplicates before creating unique index |
| "Out of disk space" | Index creation requires temporary storage | Free up disk space or use CONCURRENTLY option |
| "Lock timeout" | Index creation blocking queries | Use CREATE INDEX CONCURRENTLY (PostgreSQL) or ALGORITHM=INPLACE (MySQL) |
| "Statistics out of date" | Old cardinality estimates | Run ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL) |
Index Types
Index Options
CONCURRENTLY: Create without blocking writes (PostgreSQL)ALGORITHM=INPLACE: Online index creation (MySQL)INCLUDE columns: Covering index (PostgreSQL 11+)WHERE clause: Partial index for filtered queriesDO:
DON'T:
/sql-query-optimizer - Rewrite queries for better performance/database-partition-manager - Partition large tables for faster queries/database-health-monitor - Monitor index bloat and maintenance needs/database-backup-automator - Schedule REINDEX during maintenance windows