Skill

query-optimization

Analyze and optimize database queries: EXPLAIN ANALYZE plan reading, index strategies (B-tree, GIN, GiST, partial, composite), N+1 detection in ORM code, connection pooling, and query plan regression testing.

From sovereign-architect

Install

Run in your terminal

npx claudepluginhub javimontano/mao-sovereign-architect

Tool Access

This skill is limited to using the following tools:

ReadWriteEditGlobGrepBashAgent

Supporting Assets

View in Repository

agents/query-optimization-agent.md

evals/evals.json

examples/sample-output.md

prompts/use-case-prompts.md

references/body-of-knowledge.md

references/knowledge-graph.mmd

references/state-of-the-art.md

Skill Content

Similar Skills

cqrs-implementation

Implements CQRS patterns with Python templates for command/query separation, event-sourcing, and scalable read/write models. Use for optimizing queries or independent scaling.

backend-development

33.0k

architecture-patterns

1 file

Implements Clean Architecture, Hexagonal Architecture (ports/adapters), and Domain-Driven Design for backend services. For microservice design, monolith refactoring to bounded contexts, and dependency debugging.

backend-development

33.0k

api-design-principles

4 files

Provides REST and GraphQL API design principles including resource hierarchies, HTTP methods, versioning strategies, pagination, and filtering patterns for new APIs, reviews, or standards.

backend-development

33.0k

Stats

Stars0

Forks0

Last CommitMar 28, 2026

Actions

View Source View Plugin View on GitHub View README

Query Optimization

"The fastest query is the one you never have to run — the second fastest is the one with a covering index."

TL;DR

Five-step procedure to systematically identify, analyze, and optimize slow queries: capturing slow query logs, reading EXPLAIN ANALYZE output, choosing the right index type, detecting N+1 patterns in ORM code, and preventing regressions.

Step 1 — Capture Slow Queries

Enable Slow Query Logging (PostgreSQL)

-- postgresql.conf or ALTER SYSTEM
ALTER SYSTEM SET log_min_duration_statement = '100';  -- Log queries > 100ms
ALTER SYSTEM SET log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h ';
SELECT pg_reload_conf();

-- pg_stat_statements — aggregate query stats
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries by total time
SELECT
  round(total_exec_time::numeric, 2) AS total_ms,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  calls,
  round(stddev_exec_time::numeric, 2) AS stddev_ms,
  rows,
  left(query, 100) AS query_snippet
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Identify Problematic Queries in Application Code

// Prisma — enable query logging
const db = new PrismaClient({
  log: [
    { level: 'query', emit: 'event' },
    { level: 'warn', emit: 'stdout' },
  ],
});

// Log all queries in development
db.$on('query', (e) => {
  if (e.duration > 100) {
    console.warn(`Slow query (${e.duration}ms):`, e.query);
  }
});

Step 2 — Read EXPLAIN ANALYZE Output

-- Always use EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.email
ORDER BY order_count DESC
LIMIT 20;

Reading the Plan — Key Signals

Signal	Meaning	Fix
`Seq Scan` on large table	No usable index	Add index on filter column
`cost=0.00..50000.00` high cost	Planner estimate is expensive	Index or rewrite
`rows=1 actual rows=100000`	Bad row estimate	ANALYZE table, update statistics
`Hash Join` vs `Nested Loop`	Hash better for large sets; Nested Loop better for small	Usually correct, verify row estimates
`Buffers: shared hit=0 read=50000`	All reads from disk	Increase `shared_buffers`, add index
`Sort Method: external merge Disk`	Sort spills to disk	Increase `work_mem`, add index for ORDER BY

Example: Identifying a Missing Index

-- Bad plan: Seq Scan on 10M row table
Seq Scan on orders  (cost=0.00..285432.00 rows=1234 width=8)
                    (actual time=0.052..2847.123 rows=1234 loops=1)
  Filter: (user_id = 42 AND status = 'pending')
  Rows Removed by Filter: 9998766

-- Fix: composite index
CREATE INDEX CONCURRENTLY idx_orders_user_status
  ON orders(user_id, status)
  WHERE status IN ('pending', 'processing');  -- Partial index reduces size

-- After index: Index Scan
Index Scan using idx_orders_user_status on orders
  (cost=0.56..12.34 rows=8 width=8)
  (actual time=0.023..0.041 rows=8 loops=1)
  Index Cond: (user_id = 42 AND status = 'pending')

Step 3 — Index Strategy Selection

Index Type Decision Framework

Use Case	Index Type	Example
Equality / range on scalar	B-tree (default)	`WHERE user_id = 42`, `WHERE created_at > ...`
Full-text search	GIN	`WHERE to_tsvector('english', body) @@ query`
JSONB field search	GIN	`WHERE metadata @> '{"type": "premium"}'`
Array contains	GIN	`WHERE tags @> ARRAY['typescript']`
Geospatial / PostGIS	GiST	`WHERE location <-> point '(0,0)' < 1`
Case-insensitive text	B-tree + `lower()` expression	`lower(email)`
Partial (subset of rows)	Partial B-tree	`WHERE deleted_at IS NULL`
Multi-column	Composite B-tree	`(user_id, created_at DESC)`

Composite Index Column Order Rule

-- Selectivity rule: most selective first, then range/sort
-- Query: WHERE user_id = ? AND status = ? ORDER BY created_at DESC

-- Correct order: equality columns first, range/sort column last
CREATE INDEX idx_orders_composite ON orders(user_id, status, created_at DESC);

-- Wrong: range column in the middle breaks index use for rest
-- CREATE INDEX bad_idx ON orders(user_id, created_at, status);

Covering Index (Index-Only Scans)

-- Include all selected columns to avoid heap fetches
CREATE INDEX idx_users_email_covering
  ON users(email)
  INCLUDE (id, name, created_at);

-- Query becomes Index Only Scan — zero heap access
SELECT id, name, created_at FROM users WHERE email = 'user@example.com';

Step 4 — N+1 Detection & Resolution

Detecting N+1 in ORM Code

// BAD — N+1: 1 query for users + N queries for each user's posts
const users = await db.user.findMany({ take: 100 });
for (const user of users) {
  const posts = await db.post.findMany({ where: { userId: user.id } });
  // 101 total queries
}

// GOOD — Eager loading with Prisma include
const users = await db.user.findMany({
  take: 100,
  include: {
    posts: {
      select: { id: true, title: true, publishedAt: true },
      where: { published: true },
      orderBy: { publishedAt: 'desc' },
      take: 5,
    },
  },
});
// 2 queries total: users + a JOIN or batched post fetch

Detecting N+1 with Query Counter

// middleware/query-counter.ts — development only
let queryCount = 0;
db.$use(async (params, next) => {
  queryCount++;
  const result = await next(params);
  return result;
});

// In tests
beforeEach(() => { queryCount = 0; });
afterEach(() => {
  if (queryCount > 10) {
    console.warn(`Potential N+1: ${queryCount} queries in this test`);
  }
});

Step 5 — Connection Pooling & Prevention

PgBouncer / Prisma Accelerate Setup

// prisma/schema.prisma
datasource db {
  provider  = "postgresql"
  url       = env("DATABASE_URL")
  // PgBouncer requires this for prepared statements
  directUrl = env("DIRECT_DATABASE_URL")
}

# .env
DATABASE_URL=postgresql://user:pass@pgbouncer-host:6432/mydb?pgbouncer=true
DIRECT_DATABASE_URL=postgresql://user:pass@postgres-host:5432/mydb

Query Plan Regression in CI

-- Store baseline plan for critical queries
-- Compare in CI to catch plan regressions
EXPLAIN (FORMAT JSON, ANALYZE)
SELECT ...
-- Parse JSON output, fail if "Seq Scan" appears on tables > 10k rows

Quality Criteria

No Seq Scan on tables > 10k rows — All filter columns have indexes.
EXPLAIN ANALYZE run before optimizing — No blind index creation; plan confirms the bottleneck.
N+1 patterns eliminated in ORM — Eager loading or DataLoader used for relationship traversal.
Composite index column order matches query — Equality columns before range/sort columns.
pg_stat_statements enabled — Slow query visibility in production.
ANALYZE run after bulk inserts — Planner statistics kept current for large data loads.
Connection pool configured — PgBouncer or application pool prevents connection storms.
Partial indexes for soft deletes — WHERE deleted_at IS NULL on most queries; index matches.

Anti-Patterns

Adding an index on every column — Index maintenance overhead slows writes; each index adds ~10-30% overhead per INSERT/UPDATE. Index only what's queried.
Creating indexes inside transactions — CREATE INDEX CONCURRENTLY cannot run inside a transaction; causes error or falls back to locking.
SELECT * in production queries — Fetches columns not needed; defeats covering indexes; increases network transfer.
ORM .findAll() without pagination — Loading entire tables into memory when the user sees 20 rows.
LIKE '%search%' on large text columns — Leading wildcard prevents B-tree use; use full-text search with GIN index and tsvector instead.
Ignoring row count estimates in EXPLAIN — A plan with rows=1 actual rows=50000 is a statistics problem that will return to different bad plans after VACUUM.
Not using EXPLAIN ANALYZE after adding an index — The planner might still choose a Seq Scan if statistics are stale or the table is small enough.