From duckdb-skills
Explore S3-compatible storage (S3, R2, GCS, MinIO) using DuckDB: lists files/sizes, previews Parquet/CSV/JSON schemas/samples/row counts without downloading.
npx claudepluginhub duckdb/duckdb-skills --plugin duckdb-skillsThis skill is limited to using the following tools:
You are helping the user explore data on remote object storage using DuckDB.
Reads and explores Parquet, CSV, JSON, Arrow IPC, Avro files locally, from S3/GCS using datafusion-cli for schema inspection, row counts, and data previews.
Manages Cloudflare R2 S3-compatible object storage in Workers: bucket creation, bindings, uploads/downloads, CORS, presigned URLs, multipart uploads. Fixes R2_ERROR and CORS issues.
Reads data files (CSV, JSON, Parquet, Avro, Excel, spatial, SQLite) or remote S3/HTTPS URLs using DuckDB. Activates for file references, 'what's in this file' queries, or dataset previews.
Share bugs, ideas, or general feedback.
You are helping the user explore data on remote object storage using DuckDB.
URL: $0
Question: ${1:-list and describe what's there}
Based on the URL or user context, prepend the appropriate secret configuration:
| Provider | URL patterns | Secret setup |
|---|---|---|
| AWS S3 | s3:// | CREATE SECRET (TYPE S3, PROVIDER credential_chain); |
| Cloudflare R2 | r2://, s3:// with R2 endpoint | CREATE SECRET (TYPE R2, PROVIDER credential_chain); |
| GCS | gs://, gcs:// | CREATE SECRET (TYPE GCS, PROVIDER credential_chain); |
| MinIO / custom | s3:// with custom endpoint | CREATE SECRET (TYPE S3, KEY_ID '...', SECRET '...', ENDPOINT '...', USE_SSL true); |
For R2, if the user provides an account ID, the endpoint is <account_id>.r2.cloudflarestorage.com. R2 URLs like r2://bucket/path should be rewritten to s3://bucket/path with the R2 secret.
For public buckets (e.g., Overture Maps, AWS open data), no secret is needed — skip this step.
Always prepend:
LOAD httpfs;
If the URL looks like a directory or bucket (no file extension, or ends with /), list its contents with sizes:
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT filename, (size / 1024 / 1024)::DECIMAL(10,1) AS size_mb, last_modified
FROM read_blob('<URL>/*')
ORDER BY filename
LIMIT 50;
"
Note: only select filename, size, last_modified — never select content, which would download the actual files.
If the URL points to a specific file or glob pattern (has a file extension or contains *), preview it:
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
DESCRIBE FROM '<URL>';
SELECT count(*) AS row_count FROM '<URL>';
FROM '<URL>' LIMIT 20;
"
For Parquet files, get row counts and sizes from metadata (no data download):
duckdb -c "
LOAD httpfs;
<SECRET_SETUP>
SELECT file_name,
sum(row_group_num_rows) AS total_rows,
(sum(row_group_compressed_bytes) / 1024 / 1024)::DECIMAL(10,1) AS compressed_mb
FROM parquet_metadata('<URL>')
GROUP BY file_name;
"
Using the listing, schema, or sample data, answer:
${1:-list and describe what's there}
If the user asks an analytical question (e.g., "how many rows match X"), write and run the appropriate SQL query. DuckDB pushes predicates down into Parquet on S3, so filtering is efficient even on large remote datasets.
duckdb: command not found → delegate to /duckdb-skills:install-duckdbaws configure, environment variables, or provide explicit key/secret