From harness-claude
Rewrites SQL queries for better execution plans via structural changes like sargable predicates, EXISTS over IN, CTE inlining, and LATERAL joins. Use for slow indexed queries, suboptimal plans, or ORM patterns.
npx claudepluginhub intense-visions/harness-engineering --plugin harness-claudeThis skill uses the workspace's default tool permissions.
> Structural query transformations that help the planner choose better execution plans without changing results.
Optimizes slow SQL queries for PostgreSQL and MySQL by analyzing execution plans, eliminating anti-patterns like subqueries and SELECT *, and suggesting rewrites with indexes.
Optimizes PostgreSQL and MySQL queries using EXPLAIN ANALYZE to identify sequential scans, inefficient joins, cardinality issues, and rewrite for indexes and low latency.
Analyzes SQL queries with EXPLAIN/ANALYZE, detects N+1 issues, missing indexes, inefficient JOINs; suggests rewrites, indexes, caching for PostgreSQL, MySQL, SQLite.
Share bugs, ideas, or general feedback.
Structural query transformations that help the planner choose better execution plans without changing results.
1. Sargable predicates -- predicates that can use indexes.
The indexed column must appear bare on one side of the operator, untransformed by functions or casts:
-- BAD (non-sargable): function wrapping prevents index use
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
SELECT * FROM orders WHERE EXTRACT(month FROM created_at) = 3;
SELECT * FROM users WHERE col::text = '5';
-- GOOD (sargable): column is bare, index can be used
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
SELECT * FROM orders
WHERE created_at >= '2024-03-01' AND created_at < '2024-04-01';
2. EXISTS vs IN
-- IN materializes the full subquery result:
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE region = 'us');
-- EXISTS short-circuits at first match:
SELECT * FROM orders o
WHERE EXISTS (SELECT 1 FROM customers c WHERE c.id = o.customer_id AND c.region = 'us');
EXISTS stops scanning once it finds one match. IN builds the full result set. For large subquery results, EXISTS is usually faster. The planner may optimize both identically, but not always.
3. CTEs vs subqueries
Before PostgreSQL 12, CTEs were always materialized (optimization fence). PostgreSQL 12+ can inline CTEs automatically:
-- PostgreSQL 12+: planner may inline this CTE
WITH active AS (
SELECT * FROM orders WHERE status = 'active'
)
SELECT * FROM active WHERE created_at > '2024-01-01';
-- Force materialization (useful when the CTE result is reused many times):
WITH active AS MATERIALIZED (
SELECT * FROM orders WHERE status = 'active'
)
SELECT * FROM active WHERE created_at > '2024-01-01';
-- Force inlining:
WITH active AS NOT MATERIALIZED (
SELECT * FROM orders WHERE status = 'active'
)
SELECT * FROM active WHERE created_at > '2024-01-01';
4. LATERAL joins replace correlated subqueries for better planner visibility:
-- BAD: correlated subquery in SELECT (executed per row)
SELECT o.*,
(SELECT count(*) FROM order_items oi WHERE oi.order_id = o.id) AS item_count
FROM orders o;
-- GOOD: LATERAL join (planner can optimize the join strategy)
SELECT o.*, li.cnt AS item_count
FROM orders o
LEFT JOIN LATERAL (
SELECT count(*) AS cnt FROM order_items WHERE order_id = o.id
) li ON true;
Rewriting a non-sargable query:
Original slow query:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE EXTRACT(month FROM created_at) = 3
AND EXTRACT(year FROM created_at) = 2024;
Seq Scan on orders (cost=0.00..298765.00 rows=50000 width=52)
(actual time=0.031..2341.567 rows=48500 loops=1)
Filter: (EXTRACT(month FROM created_at) = 3 AND EXTRACT(year FROM created_at) = 2024)
Rows Removed by Filter: 9951500
Execution Time: 2345.123 ms
The index on created_at is ignored because the column is wrapped in EXTRACT().
Rewritten sargable version:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE created_at >= '2024-03-01' AND created_at < '2024-04-01';
Index Scan using idx_orders_created_at on orders
(cost=0.43..4521.23 rows=48200 width=52)
(actual time=0.025..34.567 rows=48500 loops=1)
Index Cond: (created_at >= '2024-03-01' AND created_at < '2024-04-01')
Execution Time: 35.891 ms
Query time dropped from 2.3 seconds to 36 milliseconds -- a 65x improvement.
Rewriting a correlated subquery to LATERAL join:
-- Before: correlated subquery (2.1s)
SELECT o.id, o.total,
(SELECT count(*) FROM order_items WHERE order_id = o.id)
FROM orders o WHERE o.status = 'active';
-- After: LATERAL join (180ms)
SELECT o.id, o.total, li.cnt
FROM orders o
LEFT JOIN LATERAL (
SELECT count(*) AS cnt FROM order_items WHERE order_id = o.id
) li ON true
WHERE o.status = 'active';
Wrapping indexed columns in functions or casts. WHERE col::text = '5' or WHERE UPPER(name) = 'ALICE' prevents index use. Use expression indexes or rewrite the predicate.
OR chains instead of IN lists. WHERE status = 'a' OR status = 'b' OR status = 'c' is harder for the planner to optimize than WHERE status IN ('a', 'b', 'c').
SELECT DISTINCT to mask a bad join. If DISTINCT is needed only because the join produces duplicates, fix the join (usually by changing to EXISTS or adding a proper GROUP BY) instead of deduplicating after the fact.
CTEs for "readability" when they prevent optimization (pre-PG12). On PostgreSQL 11 and earlier, every CTE is an optimization fence. The planner cannot push filters from the outer query into the CTE.
OFFSET for pagination on large tables. OFFSET 100000 LIMIT 20 must scan and discard 100K rows. Use keyset pagination instead:
-- BAD: OFFSET-based (slow for deep pages)
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;
-- GOOD: keyset pagination (constant speed regardless of page depth)
SELECT * FROM orders WHERE id > 100000 ORDER BY id LIMIT 20;
enable_* GUC flags for testing plan alternatives (development only, never in production):
SET enable_seqscan = off; -- force index usage to test
SET enable_nestloop = off; -- force hash or merge join
pg_hint_plan extension provides query hints when the planner cannot be convinced:
/*+ IndexScan(orders idx_orders_status) */ SELECT * FROM orders WHERE status = 'active';
EXPLAIN (ANALYZE, SETTINGS) in PostgreSQL 15+ shows non-default settings affecting the plan, helping identify configuration-driven plan choices.
Keyset pagination for efficient deep pagination:
-- First page:
SELECT * FROM orders ORDER BY created_at, id LIMIT 20;
-- Next page (using last row's values):
SELECT * FROM orders
WHERE (created_at, id) > ('2024-01-15 10:30:00', 12345)
ORDER BY created_at, id LIMIT 20;
Keyset pagination runs in constant time regardless of page depth, unlike OFFSET which degrades linearly.
Materialized views for expensive aggregations that are read frequently but can tolerate staleness:
CREATE MATERIALIZED VIEW monthly_sales AS
SELECT date_trunc('month', created_at) AS month, sum(total) AS revenue
FROM orders GROUP BY 1;
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales;
UNION ALL vs UNION. UNION implies DISTINCT, requiring a sort or hash to deduplicate. Use UNION ALL when duplicates are acceptable or impossible.
Join elimination. PostgreSQL can eliminate unused LEFT JOINs when the joined table's columns are not referenced in SELECT, WHERE, or ORDER BY. This optimization applies only when the join has a UNIQUE or PRIMARY KEY constraint.
MySQL CTEs (8.0+) are derived tables that the optimizer may merge or materialize automatically. MySQL has no MATERIALIZED / NOT MATERIALIZED keywords -- you cannot control CTE materialization.
MySQL lacks LATERAL joins. Use correlated subqueries instead. For complex cases, rewrite as derived tables with join conditions.
MySQL IN-to-EXISTS optimization. MySQL's optimizer automatically converts IN (subquery) to EXISTS in many cases, making the performance difference less pronounced than in PostgreSQL.
MySQL STRAIGHT_JOIN forces join order, equivalent to PostgreSQL's approach of setting join_collapse_limit = 1. Use when the optimizer consistently chooses the wrong join order.
E-commerce API with 200ms p99 latency target. The top slow query used WHERE LOWER(email) = lower(?) on a 50M users table. An expression index CREATE INDEX ON users (lower(email)) combined with the sargable predicate WHERE lower(email) = lower(?) dropped the query from 1.2 seconds to 2ms. A second optimization replaced OFFSET 10000 LIMIT 20 pagination in the order listing endpoint with keyset pagination using WHERE id > ? ORDER BY id LIMIT 20, reducing paginated queries from 400ms to 4ms regardless of page depth.