From duckdb-skills
Converts data files between formats like CSV, Parquet, JSON, Excel, GeoJSON using DuckDB. Supports remote inputs and binary outputs Claude cannot generate natively.
npx claudepluginhub duckdb/duckdb-skills --plugin duckdb-skillsThis skill is limited to using the following tools:
You are helping the user convert a data file from one format to another using DuckDB.
Convert tabular data between CSV, TSV, Excel, JSONL, Parquet, and other formats with auto-detection, indexing, and verification using qsv tools.
Execute DuckDB CLI commands for SQL queries on CSV/Parquet/JSON files, data conversion (CSV to Parquet, JSON to Parquet), persistent database management, and schema inspection.
Assists with file format conversion tasks in data pipelines, providing step-by-step guidance, production-ready code, configurations, and best practices for ETL and data transformation.
Share bugs, ideas, or general feedback.
You are helping the user convert a data file from one format to another using DuckDB.
Input file: $0
Output file: ${1:-}
Input: $0. If it's a bare filename (no /), resolve to a full path with find "$PWD" -name "$0" -not -path '*/.git/*' 2>/dev/null | head -1.
Output: If $1 is provided, use it as the output path. If not, default to the same stem as the input with a .parquet extension (e.g., data.csv → data.parquet).
Infer the output format from the output file extension:
| Extension | Format clause |
|---|---|
.parquet, .pq | (default, no clause needed) |
.csv | (FORMAT csv, HEADER) |
.tsv | (FORMAT csv, HEADER, DELIMITER '\t') |
.json | (FORMAT json, ARRAY true) |
.jsonl, .ndjson | (FORMAT json, ARRAY false) |
.xlsx | (FORMAT xlsx) — requires INSTALL excel; LOAD excel; |
.geojson | (FORMAT GDAL, DRIVER 'GeoJSON') — requires LOAD spatial; |
.gpkg | (FORMAT GDAL, DRIVER 'GPKG') — requires LOAD spatial; |
.shp | (FORMAT GDAL, DRIVER 'ESRI Shapefile') — requires LOAD spatial; |
Run a single DuckDB command. Prepend extension loads as needed based on both the input and output formats.
duckdb -c "
<EXTENSION_LOADS>
COPY (FROM '<INPUT_PATH>') TO '<OUTPUT_PATH>' <FORMAT_CLAUSE>;
"
For remote inputs (s3://, https://, etc.), prepend the same protocol setup as read-file:
| Protocol | Prepend |
|---|---|
s3:// | LOAD httpfs; CREATE SECRET (TYPE S3, PROVIDER credential_chain); |
gs:// / gcs:// | LOAD httpfs; CREATE SECRET (TYPE GCS, PROVIDER credential_chain); |
https:// / http:// | LOAD httpfs; |
If the user mentions partitioning (e.g., "partition by year"), add PARTITION_BY (col) to the format clause. This only works with Parquet and CSV output.
If the user mentions compression (e.g., "use zstd"), add CODEC 'zstd' for Parquet output.
On success, report:
ls -lh)On failure:
duckdb: command not found → delegate to /duckdb-skills:install-duckdb/duckdb-skills:read-file first to inspect it