From snowflake-skills
Optimizes Snowflake SQL queries for performance by fixing anti-patterns like functions on filter/join columns while preserving identical results. Activates on optimization requests, slow query mentions, or performance reviews.
npx claudepluginhub altimateai/data-engineering-skills --plugin dbt-skillsThis skill uses the workspace's default tool permissions.
Return ONLY the optimized SQL query. No markdown formatting, no explanations, no bullet points - just pure SQL that can be executed directly in Snowflake.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Return ONLY the optimized SQL query. No markdown formatting, no explanations, no bullet points - just pure SQL that can be executed directly in Snowflake.
The optimized query MUST return IDENTICAL results to the original.
Before returning ANY optimization, verify:
ORDER BY exactly as writtenLIMIT N, keep LIMIT N. If no LIMIT, do NOT add one.If you cannot guarantee identical results, return the original query unchanged.
Problem: Functions on columns in WHERE clause prevent partition pruning and index usage.
| Original | Optimized | Why Safe |
|---|---|---|
WHERE DATE(ts) = '2024-01-01' | WHERE ts >= '2024-01-01' AND ts < '2024-01-02' | Equivalent range |
WHERE YEAR(dt) = 2024 | WHERE dt >= '2024-01-01' AND dt < '2025-01-01' | Equivalent range |
WHERE MONTH(dt) = 3 AND YEAR(dt) = 2024 | WHERE dt >= '2024-03-01' AND dt < '2024-04-01' | Equivalent range |
WHERE DATE(ts) >= '2024-01-01' AND DATE(ts) < '2024-02-01' | WHERE ts >= '2024-01-01' AND ts < '2024-02-01' | Same boundaries |
WHERE YEAR(dt) BETWEEN 1995 AND 1996 | WHERE dt >= '1995-01-01' AND dt < '1997-01-01' | Equivalent range |
| Pattern | Why Not |
|---|---|
WHERE YEAR(dt) IN (SELECT year FROM ...) | Dynamic values, cannot precompute range |
WHERE DATE(ts) = DATE(other_col) | Comparing two columns, both need function |
WHERE EXTRACT(DOW FROM dt) = 1 | Day-of-week has no contiguous range |
WHERE DATE_TRUNC('month', dt) = '2024-01-01' in GROUP BY | Needed for grouping logic |
SELECT YEAR(dt) AS yr ... GROUP BY YEAR(dt) | Function in SELECT/GROUP BY is fine, only filter matters |
Problem: Functions on JOIN columns prevent hash joins, forcing slower nested loop joins.
| Original | Optimized | Why Safe |
|---|---|---|
ON CAST(a.id AS VARCHAR) = CAST(b.id AS VARCHAR) | ON a.id = b.id | If both are same type (e.g., INTEGER) |
ON UPPER(a.code) = UPPER(b.code) | ON a.code = b.code | If data is already consistently cased |
ON TRIM(a.name) = TRIM(b.name) | ON a.name = b.name | If data has no leading/trailing spaces |
| Pattern | Why Not |
|---|---|
ON CAST(a.id AS VARCHAR) = b.string_id | Types genuinely differ, CAST required |
ON DATE(a.timestamp) = b.date_col | Different granularity, DATE() required |
ON UPPER(a.code) = b.code | If b.code might have different case |
ON a.id = b.id + 1 | Arithmetic transformation, cannot remove |
Problem: NOT IN has poor performance and unexpected NULL behavior.
| Original | Optimized | Why Safe |
|---|---|---|
WHERE id NOT IN (SELECT id FROM t WHERE ...) | WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.id = main.id AND ...) | Equivalent when subquery column is NOT NULL |
WHERE id NOT IN (SELECT id FROM t) where id has NOT NULL constraint | WHERE NOT EXISTS (SELECT 1 FROM t WHERE t.id = main.id) | NOT NULL guarantees equivalence |
| Pattern | Why Not |
|---|---|
WHERE id NOT IN (SELECT nullable_col FROM t) | If subquery returns NULL, NOT IN returns no rows; NOT EXISTS doesn't |
WHERE (a, b) NOT IN (SELECT x, y FROM t) | Multi-column NOT IN has complex NULL semantics |
Key Rule: Only convert NOT IN to NOT EXISTS if you can verify the subquery column cannot be NULL.
Problem: Same subquery executed multiple times causes redundant scans.
| Original | Optimized |
|---|---|
| Subquery appears 2+ times identically | Extract to CTE, reference CTE multiple times |
| Same aggregation used in multiple places | Compute once in CTE |
| Pattern | Why Not |
|---|---|
| Correlated subquery (references outer table) | Each execution is different, cannot cache |
| Subqueries with different filters | Not actually the same subquery |
| Subquery in SELECT that depends on current row | Correlation prevents extraction |
Problem: Comma-separated tables in FROM clause are harder to read and optimize.
Convert FROM a, b, c WHERE a.id = b.id AND b.id = c.id to explicit JOIN syntax.
This is always safe - just restructuring, no semantic change.
SUM(SUM(x)) OVER(...) or similar nested aggregates