From duckdb-skills
Executes raw SQL or natural language queries against attached DuckDB databases or ad-hoc files. Manages session state, schema retrieval, and result size estimation.
npx claudepluginhub duckdb/duckdb-skills --plugin duckdb-skillsThis skill is limited to using the following tools:
You are helping the user query data using DuckDB.
Execute DuckDB CLI commands for SQL queries on CSV/Parquet/JSON files, data conversion (CSV to Parquet, JSON to Parquet), persistent database management, and schema inspection.
Guides DuckDB SQL writing, analytical queries, CSV/Parquet/JSON ingestion, ETL pipelines, client APIs for Python/Node/Rust/Java/Go, extensions, configuration, and performance tuning.
Runs SQL queries or natural language questions against registered tables or ad-hoc on Parquet, CSV, JSON, Arrow IPC files using datafusion-cli.
Share bugs, ideas, or general feedback.
You are helping the user query data using DuckDB.
Input: $@
Follow these steps in order.
Look for an existing state file in either location:
STATE_DIR=""
test -f .duckdb-skills/state.sql && STATE_DIR=".duckdb-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.duckdb-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.duckdb-skills/$PROJECT_ID"
If found, verify the databases it references are still accessible:
duckdb -init "$STATE_DIR/state.sql" -c "SHOW DATABASES;"
Now determine the mode:
--file flag is present, or the SQL references file paths/literals (e.g. FROM 'data.csv'), or STATE_DIR is empty.STATE_DIR is set and the input references table names, is natural language, or is SQL without file references.If no state file exists and no file is referenced, fall back to ad-hoc mode against :memory: — the user must reference files directly in their SQL.
If the state file exists but any ATTACH in it fails, warn the user and fall back to ad-hoc mode.
command -v duckdb
If not found, delegate to /duckdb-skills:install-duckdb and then continue.
If the input is natural language (not valid SQL), generate SQL using the Friendly SQL reference below.
In session mode, first retrieve the schema to inform query generation:
duckdb -init "$STATE_DIR/state.sql" -csv -c "
SELECT table_name FROM duckdb_tables() ORDER BY table_name;
"
Then for relevant tables:
duckdb -init "$STATE_DIR/state.sql" -csv -c "DESCRIBE <table_name>;"
Use the schema context and the Friendly SQL reference to generate the most appropriate query.
Before executing, estimate whether the query could produce a very large result that would consume excessive tokens when returned to this conversation.
Session mode — check row counts for the tables involved:
duckdb -init "$STATE_DIR/state.sql" -csv -c "
SELECT table_name, estimated_size, column_count
FROM duckdb_tables()
WHERE table_name IN ('<table1>', '<table2>');
"
Ad-hoc mode — probe the source:
duckdb :memory: -csv -c "
SET allowed_paths=['FILE_PATH'];
SET enable_external_access=false;
SET allow_persistent_secrets=false;
SET lock_configuration=true;
SELECT count() AS row_count FROM 'FILE_PATH';
"
Evaluate:
LIMIT, count(), or other aggregation that bounds the output -> safe, proceed.LIMIT 1000 or an aggregation to keep the output manageable."
Ask for confirmation before running as-is.Skip this step for queries that are intrinsically bounded (e.g. DESCRIBE, SUMMARIZE, aggregations, count()).
Ad-hoc mode (sandboxed — only the referenced file is accessible):
duckdb :memory: -csv <<'SQL'
SET allowed_paths=['FILE_PATH'];
SET enable_external_access=false;
SET allow_persistent_secrets=false;
SET lock_configuration=true;
<QUERY>;
SQL
Replace FILE_PATH with the actual file path extracted from the query or --file argument.
If multiple files are referenced, include all paths in the allowed_paths list.
Session mode (user-trusted database):
duckdb -init "$STATE_DIR/state.sql" -csv -c "<QUERY>"
For multi-line queries, use a heredoc with -init:
duckdb -init "$STATE_DIR/state.sql" -csv <<'SQL'
<QUERY>;
SQL
Always use heredocs (<<'SQL') for multi-line queries to avoid shell quoting issues.
Extension "X" not loaded): delegate to /duckdb-skills:install-duckdb <ext>, then retry.FROM duckdb_tables() and suggest corrections.find "$PWD" -name "<filename>" 2>/dev/null to locate the file and suggest the corrected path./duckdb-skills:duckdb-docs <error message or relevant keywords> to search the documentation for guidance, then apply the fix and retry.Show the query output to the user. If the result has more than 100 rows, note the truncation and suggest adding LIMIT to the query.
For natural language questions, also provide a brief interpretation of the results.
When generating SQL, prefer these idiomatic DuckDB constructs:
FROM table WHERE x > 10 (implicit SELECT *)LIMIT 10% returns a percentage of rowsSELECT x: 42 instead of SELECT 42 AS xcount(*)SELECT i+1 AS j, j+2 AS kcount() FILTER (WHERE x > 10) for conditional aggregationmax(col, 3) returns top 3 as a list; also arg_max(arg, val, n), min_by(arg, val, n)getvariable('x')FROM 'file.csv', FROM 'data.parquet'FROM 'data/part-*.parquet' reads multiple files'hello'.upper() or col.trim().lower()[x*2 FOR x IN list_col]col[1:3], negative indexing col[-1]SELECT s.* FROM (SELECT {'a': 1, 'b': 2} AS s)[1, 2, 3]format('{}->{}', a, b) for string formattingDROP TABLE IF EXISTS first