From datafusion-skills
Visualize and analyze Apache DataFusion query execution plans. Shows logical/physical plans, detects bottlenecks like full scans/sorts/joins/repartitions, suggests optimizations. Supports EXPLAIN ANALYZE.
npx claudepluginhub datafusion-contrib/datafusion-skills --plugin datafusion-skillsThis skill is limited to using the following tools:
You are helping the user understand and optimize query execution plans in Apache DataFusion.
Analyzes SQL queries for missing indexes, N+1 patterns, suboptimal joins, full table scans. Interprets EXPLAIN plans, detects anti-patterns, recommends indexes, rewrites queries for PostgreSQL, MySQL, SQLite.
Analyzes slow queries in PostgreSQL, MySQL, MongoDB using EXPLAIN plans, buffer stats, index usage to detect sequential scans, missing indexes, and recommend optimizations.
Optimize slow database queries — analyze execution plans, add indexes, rewrite queries. Use when asked about "slow query", "optimize SQL", "query performance", or "explain this query".
Share bugs, ideas, or general feedback.
You are helping the user understand and optimize query execution plans in Apache DataFusion.
Input: $@
command -v datafusion-cli
If not found, delegate to /datafusion-skills:install-datafusion.
STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"
--analyze is present → use EXPLAIN ANALYZE (actually runs the query, shows real metrics)EXPLAIN (shows the plan without execution)Extract the SQL query (remove --analyze flag if present).
If the input is natural language, generate SQL first (see /datafusion-skills:query for SQL generation guidelines).
Physical plan (default — shows the execution plan as a visual tree):
datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN $SQL;
"
Verbose plan (full optimizer trace — initial logical plan, each optimization pass, initial physical plan, final physical plan with stats and schema):
datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN VERBOSE $SQL;
"
With actual metrics (if --analyze) (runs the query, reports per-operator row counts, timing, memory, spill stats):
datafusion-cli ${STATE_DIR:+--file "$STATE_DIR/state.sql"} -c "
EXPLAIN ANALYZE $SQL;
"
Parse the execution plan output and provide insights:
Full table scans → Look for TableScan without pushdown predicates
WHERE clauses or partitioningSort operations → SortExec or SortPreservingMergeExec
Hash joins vs merge joins → HashJoinExec vs SortMergeJoinExec
Repartitioning → RepartitionExec
Projection pushdown → Check if only needed columns are read
Predicate pushdown → Check if filters are pushed to the scan level
predicate in TableScan nodesCoalesce partitions → CoalescePartitionsExec
Structure the analysis as:
Brief description of what the plan does.
Present the plan as an indented tree (already DataFusion's default output format).
Actionable suggestions, such as:
If relevant, suggest DataFusion configuration changes:
-- Increase target partitions for more parallelism
SET datafusion.execution.target_partitions = 8;
-- Increase batch size for throughput
SET datafusion.execution.batch_size = 16384;
-- Enable/disable optimizations
SET datafusion.optimizer.enable_round_robin_repartition = true;
To explore these settings, try
/datafusion-skills:datafusion-docs configuration options.