Skill

explain-plan

Visualize and analyze Apache DataFusion query execution plans. Shows logical/physical plans, detects bottlenecks like full scans/sorts/joins/repartitions, suggests optimizations. Supports EXPLAIN ANALYZE.

Bash

database

performance

Popularity

Stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/datafusion-skills:explain-plan

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

Bash

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are helping the user understand and optimize query execution plans in Apache DataFusion.

SKILL.md

144 lines · ~1.1k tokens

Stats

Stars12

MaintenanceGood

Last CommitMar 21, 2026

Actions

View Source View Plugin View on GitHub View README

Step 1 — Check datafusion-cli is installed

command -v datafusion-cli

If not found, delegate to /datafusion-skills:install-datafusion.

Step 2 — Resolve state

STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"

Step 3 — Determine the mode

If --analyze is present → use EXPLAIN ANALYZE (actually runs the query, shows real metrics)
Otherwise → use EXPLAIN (shows the plan without execution)

Extract the SQL query (remove --analyze flag if present).

If the input is natural language, generate SQL first (see /datafusion-skills:query for SQL generation guidelines).

Step 4 — Run EXPLAIN

Physical plan (default — shows the execution plan as a visual tree):

datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN $SQL;
"

Verbose plan (full optimizer trace — initial logical plan, each optimization pass, initial physical plan, final physical plan with stats and schema):

datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN VERBOSE $SQL;
"

With actual metrics (if --analyze) (runs the query, reports per-operator row counts, timing, memory, spill stats):

datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN ANALYZE $SQL;
"

Step 5 — Analyze the plan

Parse the execution plan output and provide insights:

Key things to look for:

Full table scans → Look for TableScan without pushdown predicates
- Suggest adding WHERE clauses or partitioning
- Check if filter pushdown is happening
Sort operations → SortExec or SortPreservingMergeExec
- Expensive for large datasets
- Suggest pre-sorting data or using sorted Parquet files
Hash joins vs merge joins → HashJoinExec vs SortMergeJoinExec
- Hash joins need memory for the build side
- Suggest which table should be the build side (smaller table)
Repartitioning → RepartitionExec
- Shows data shuffling between partitions
- Can be expensive for large datasets
Projection pushdown → Check if only needed columns are read
- DataFusion should push projections down to the scan
Predicate pushdown → Check if filters are pushed to the scan level
- Look for predicate in TableScan nodes
Coalesce partitions → CoalescePartitionsExec
- Merging partitions back to single partition
- Expected at the top of the plan

For EXPLAIN ANALYZE, additionally check:

Row counts at each stage → identify data amplification or reduction
Execution time per operator → find the bottleneck
Memory usage → identify memory-intensive operations

Step 6 — Present findings

Structure the analysis as:

Query Plan Summary

Brief description of what the plan does.

Plan Visualization

Present the plan as an indented tree (already DataFusion's default output format).

Performance Analysis

Bottlenecks: Operations that are likely slowest
Optimizations applied: Filter pushdown, projection pushdown, etc.
Opportunities: Suggestions for improving performance

Recommendations

Actionable suggestions, such as:

Add indexes or sort data
Rewrite the query to enable better pushdown
Adjust DataFusion configuration options
Use partitioned data layout

Step 7 — Suggest configuration tuning

If relevant, suggest DataFusion configuration changes:

-- Increase target partitions for more parallelism
SET datafusion.execution.target_partitions = 8;

-- Increase batch size for throughput
SET datafusion.execution.batch_size = 16384;

-- Enable/disable optimizations
SET datafusion.optimizer.enable_round_robin_repartition = true;

To explore these settings, try /datafusion-skills:datafusion-docs configuration options.

explain-plan

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

explain-plan

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Step 1 — Check datafusion-cli is installed

Step 2 — Resolve state

Step 3 — Determine the mode

Step 4 — Run EXPLAIN

Step 5 — Analyze the plan

Key things to look for:

For EXPLAIN ANALYZE, additionally check:

Step 6 — Present findings

Query Plan Summary

Plan Visualization

Performance Analysis

Recommendations

Step 7 — Suggest configuration tuning

Similar Skills

Step 1 — Check datafusion-cli is installed

Step 2 — Resolve state

Step 3 — Determine the mode

Step 4 — Run EXPLAIN

Step 5 — Analyze the plan

Key things to look for:

For EXPLAIN ANALYZE, additionally check:

Step 6 — Present findings

Query Plan Summary

Plan Visualization

Performance Analysis

Recommendations

Step 7 — Suggest configuration tuning

Similar Skills