Transform and export data using DuckDB SQL. Read CSV/Parquet/JSON/Excel/databases, apply SQL transformations (joins, aggregations, PIVOT/UNPIVOT, sampling), and optionally write results to files. Use when the user wants to: (1) Clean, filter, or transform data, (2) Join multiple data sources, (3) Convert between formats (CSV→Parquet, etc.), (4) Create partitioned datasets, (5) Sample large datasets, (6) Export query results. Prefer this over in-context reasoning for datasets with thousands of rows or complex transformations.
/plugin marketplace add richard-gyiko/data-wrangler-plugin/plugin install data-wrangler@data-wrangler-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
SECRETS.mdTRANSFORMS.mdexamples/secrets.yamlscripts/query_duckdb.pyTransform and export data using DuckDB SQL.
IMPORTANT - Windows Shell Escaping:
cd to the skill directory first\")cd "<skill_directory>" && echo "{\"query\": \"SELECT * FROM 'D:/path/to/file.csv'\"}" | uv run scripts/query_duckdb.py
Get schema, statistics, and sample in one call. Use before writing queries to understand data structure.
{"mode": "explore", "path": "D:/data/sales.csv"}
Response:
{
"file": "D:/data/sales.csv",
"format": "csv",
"row_count": 15234,
"columns": [
{"name": "order_id", "type": "BIGINT", "null_count": 0, "null_percent": 0.0},
{"name": "customer", "type": "VARCHAR", "null_count": 45, "null_percent": 0.3}
],
"sample": "| order_id | customer | ... |\\n|----------|----------|-----|\\n| 1001 | Alice | ... |"
}
Options:
sample_rows: Number of sample rows (default: 10, max: 100)sources: For database tables (same as query mode)Return results directly to Claude for analysis.
{"query": "SELECT * FROM 'data.csv' LIMIT 10"}
{
"query": "SELECT s.*, p.category FROM sales s JOIN products p ON s.product_id = p.id",
"sources": [
{"type": "file", "alias": "sales", "path": "/data/sales.parquet"},
{"type": "file", "alias": "products", "path": "/data/products.csv"}
]
}
Export query results to files. Add an output object to write instead of returning data.
{
"query": "SELECT * FROM 'raw.csv' WHERE status = 'active'",
"output": {
"path": "D:/output/filtered.parquet",
"format": "parquet"
}
}
{
"query": "SELECT *, YEAR(date) as year, MONTH(date) as month FROM 'events.csv'",
"output": {
"path": "D:/output/events/",
"format": "parquet",
"options": {
"compression": "zstd",
"partition_by": ["year", "month"],
"overwrite": true
}
}
}
| Format | Options |
|---|---|
parquet | compression (zstd/snappy/gzip/lz4), partition_by, row_group_size |
csv | header (default: true), delimiter, compression, partition_by |
json | array (true=JSON array, false=newline-delimited) |
Response includes verification info - no need for follow-up queries:
{
"success": true,
"output_path": "D:/output/events/",
"format": "parquet",
"rows_written": 15234,
"files_created": ["D:/output/events/year=2023/data_0.parquet", "..."],
"total_size_bytes": 5678901,
"duration_ms": 1234
}
By default, existing files are not overwritten. Set options.overwrite: true to allow.
{
"query": "SQL statement",
"sources": [...],
"output": {"path": "...", "format": "..."},
"options": {"max_rows": 200, "format": "markdown"},
"secrets_file": "path/to/secrets.yaml"
}
max_rows: Maximum rows to return (default: 200)max_bytes: Maximum response size (default: 200000)format: markdown (default), json, records, or csv| column1 | column2 |
|---|---|
| value1 | value2 |
{
"schema": [{"name": "col1", "type": "INTEGER"}],
"rows": [[1, "value"]],
"truncated": false,
"warnings": [],
"error": null
}
{"type": "file", "alias": "data", "path": "/path/to/file.csv"}
Glob patterns: {"path": "/logs/**/*.parquet"}
Custom delimiter: {"path": "/data/file.csv", "delimiter": "|"}
{
"type": "postgres", "alias": "users",
"host": "host", "port": 5432, "database": "db",
"user": "user", "password": "pass",
"schema": "public", "table": "users"
}
{
"type": "mysql", "alias": "orders",
"host": "host", "port": 3306, "database": "db",
"user": "user", "password": "pass", "table": "orders"
}
{"type": "sqlite", "alias": "data", "path": "/path/to/db.sqlite", "table": "tablename"}
{
"type": "s3", "alias": "logs",
"url": "s3://bucket/path/*.parquet",
"aws_region": "us-east-1",
"aws_access_key_id": "...", "aws_secret_access_key": "..."
}
See TRANSFORMS.md for advanced patterns including:
USING SAMPLE n ROWS or SAMPLE 10%EXCLUDE, REPLACE, COLUMNS('pattern')-- PIVOT: Convert rows to columns
PIVOT sales ON quarter USING SUM(revenue) GROUP BY region
-- UNPIVOT: Convert columns to rows
UNPIVOT data ON q1, q2, q3, q4 INTO NAME quarter VALUE amount
-- Sampling: Random 10% with reproducible seed
SELECT * FROM large_table USING SAMPLE 10% REPEATABLE(42)
-- Dynamic columns: Exclude sensitive, transform email
SELECT * EXCLUDE (ssn) REPLACE (LOWER(email) AS email) FROM users
DESCRIBE SELECT * FROM 'file.csv'SELECT * FROM 'file.csv' LIMIT 5output to write resultserror is non-null: Check column names, verify pathstruncated is true: Use more aggregation or filtersoptions.overwrite: trueStore credentials securely in YAML. See SECRETS.md for complete documentation.
{
"query": "SELECT * FROM customers LIMIT 10",
"secrets_file": "D:/path/to/secrets.yaml",
"sources": [{
"type": "postgres", "alias": "customers",
"secret": "my_postgres", "table": "customers"
}]
}
Supported: PostgreSQL, MySQL, S3, GCS, Azure, R2, HTTP, HuggingFace, Iceberg, DuckLake.
Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.