create-table | datafusion-skills | ClaudePluginHub

Skill

create-table

From datafusion-skills

Registers Parquet, CSV, JSON, Arrow IPC, or Avro files as persistent external tables in DataFusion sessions. Auto-detects format, explores schema, and persists state for reuse across skills.

data-engineering

$

npx claudepluginhub datafusion-contrib/datafusion-skills --plugin datafusion-skills

Tool Access

This skill is limited to using the following tools:

Bash

Preview

You are helping the user register a data file as a persistent table in their DataFusion session.

SKILL.md

Similar Skills

query

11

Runs SQL queries or natural language questions against registered tables or ad-hoc on Parquet, CSV, JSON, Arrow IPC files using datafusion-cli.

1 tool

datafusion-skills

query

412

Executes raw SQL or natural language queries against attached DuckDB databases or ad-hoc files. Manages session state, schema retrieval, and result size estimation.

1 tool

duckdb-en

586

Execute DuckDB CLI commands for SQL queries on CSV/Parquet/JSON files, data conversion (CSV to Parquet, JSON to Parquet), persistent database management, and schema inspection.

1 file

sundial-org-awesome-openclaw-skills-4

Stats

Stars11

Forks0

Last CommitMar 21, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

Help us improve

Share bugs, ideas, or general feedback.

You are helping the user register a data file as a persistent table in their DataFusion session.

File path given: $0 Additional arguments: ${1:-}

Follow these steps in order.

Step 1 — Resolve the file path

If $0 is a relative path, resolve it:

RESOLVED_PATH="$(cd "$(dirname "$0")" 2>/dev/null && pwd)/$(basename "$0")"

Check the file exists (for local files):

test -f "$RESOLVED_PATH" || test -d "$RESOLVED_PATH"

Exists → continue
Not found → if it looks like an S3/GCS URI, continue anyway. Otherwise ask the user to check the path.

For directories (partitioned data), use the directory path as-is.

Step 2 — Check datafusion-cli is installed

command -v datafusion-cli

If not found, delegate to /datafusion-skills:install-datafusion.

Step 3 — Detect format

If --format was specified, use that. Otherwise detect from extension:

Extension	Format
`.parquet`, `.pq`	PARQUET
`.csv`, `.tsv`, `.txt`	CSV
`.json`, `.jsonl`, `.ndjson`	JSON
`.arrow`, `.ipc`, `.feather`	ARROW
`.avro`	AVRO
directory	PARQUET (default for partitioned data)

If the extension is unknown, try Parquet first, then CSV.

Step 4 — Derive table name

If --name was specified, use that. Otherwise derive from the filename:

Remove extension
Replace hyphens and spaces with underscores
Lowercase
Remove non-alphanumeric characters (except underscores)

Example: My-Data File.parquet → my_data_file

Confirm the name with the user.

Step 5 — Resolve state directory

STATE_DIR=""
test -f .datafusion-skills/state.sql && STATE_DIR=".datafusion-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.datafusion-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.datafusion-skills/$PROJECT_ID"

If no state directory exists, ask the user where to store state (same as other skills):

In the project directory (.datafusion-skills/)

In your home directory (~/.datafusion-skills/<project-id>/)

mkdir -p "$STATE_DIR"
touch "$STATE_DIR/state.sql"

Step 6 — Create the external table and explore

Build the CREATE EXTERNAL TABLE statement:

For Parquet:

CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS PARQUET LOCATION '<RESOLVED_PATH>';

For CSV:

CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS CSV LOCATION '<RESOLVED_PATH>' OPTIONS ('has_header' 'true');

For JSON:

CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS JSON LOCATION '<RESOLVED_PATH>';

For Arrow IPC:

CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS ARROW LOCATION '<RESOLVED_PATH>';

For Avro:

CREATE EXTERNAL TABLE IF NOT EXISTS <table_name> STORED AS AVRO LOCATION '<RESOLVED_PATH>';

Test it:

datafusion-cli --file "$STATE_DIR/state.sql" -c "
<CREATE_STATEMENT>
DESCRIBE <table_name>;
SELECT COUNT(*) AS row_count FROM <table_name>;
SELECT * FROM <table_name> LIMIT 5;
"

Step 7 — Persist to state file

Check if this table is already in the state file:

grep -q "<table_name>" "$STATE_DIR/state.sql" 2>/dev/null

If not present, append:

cat >> "$STATE_DIR/state.sql" <<'SQL'
-- Table: <table_name> (<FORMAT> from <RESOLVED_PATH>)
<CREATE_STATEMENT>
SQL

Step 8 — Report

Summarize:

Table name: <table_name>
Format: Parquet/CSV/JSON/Arrow/Avro
Location: the resolved path
Columns: list with types
Row count: total rows
State file: path to state.sql

This table is now available in all /datafusion-skills:query sessions. Try: /datafusion-skills:query SELECT * FROM <table_name> LIMIT 10