Guides qsv-based CSV wrangling with standard workflow order, tool selection matrix for tasks like filtering/joining/aggregating, selection syntax, and pipeline patterns for cleaning/profiling.
From qsv-data-wranglingnpx claudepluginhub dathere/qsv --plugin qsv-data-wranglingThis skill uses the workspace's default tool permissions.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Always follow this sequence when processing CSV data:
qsv_get_working_dir and qsv_set_working_dir to syncindex (enables fast random access for subsequent commands)sniff (detect format, encoding, delimiter) -> headers -> countstats --cardinality --stats-jsonl (creates cache used by smart commands)slice --len 5 (preview rows), frequency --frequency-jsonl (value distributions with cache for reuse)validate (against JSON Schema), stats (verify results)tojsonl, table, qsv_to_parquetdescribegpt --all (AI-generated Data Dictionary, Description & Tags)| Task | Best Tool | Alternative | When to Use Alternative |
|---|---|---|---|
| Select columns | select | sqlp | Need computed columns |
| Filter rows | search | sqlp | Complex WHERE conditions |
| Sort data | sort | sqlp | Need ORDER BY with LIMIT |
| Remove duplicates | dedup | sqlp | Need GROUP BY dedup |
| Join two files | joinp | join | join for memory-constrained |
| Aggregate/GROUP BY | sqlp | frequency | frequency for simple counts; --frequency-jsonl creates cache |
| Column stats | stats | moarstats | moarstats for extended stats |
| Find/replace | replace | sqlp | sqlp for conditional replace |
| Reshape wide->long | transpose --long | - | DuckDB UNPIVOT (external) for complex reshaping |
| Reshape long->wide | pivotp | sqlp | Complex pivots |
| Concatenate files | cat rows | cat rowskey | Different column orders |
| Sample rows | sample | slice | slice for positional ranges |
| Document dataset | describegpt | — | AI-generated Data Dictionary, Description & Tags |
Used by select, search, sort, dedup, frequency, and other commands:
| Syntax | Meaning | Example |
|---|---|---|
name | Column by name | select "City" |
1 | Column by 1-based index | select 1 |
1,3,5 | Multiple columns | select 1,3,5 |
1-5 | Range (inclusive) | select 1-5 |
!col | Exclude column | select '!SSN' |
!1-3 | Exclude range | select '!1-3' |
/regex/ | Match column names | select '/^price/' |
sniff -> index -> safenames -> fixlengths -> sqlp (TRIM) -> dedup -> validate
sniff -> index -> stats --cardinality --stats-jsonl -> read .stats.csv -> frequency (on key columns) -> sqlp (GROUP BY queries)
Before writing SQL: read .stats.csv to learn column types, cardinality, nullcount, min/max, sort order. Run frequency on columns you'll GROUP BY or filter on. Use this to write precise WHERE clauses, correct type casts, and avoid unnecessary COALESCE.
For repeated SQL queries on large CSV (> 10MB), consider converting to Parquet: sniff -> index -> stats -> to_parquet -> sqlp (using read_parquet()). Note: sqlp can query CSV of any size directly.
index (both files) -> stats (both) -> joinp -> select (keep needed columns) -> sort
sniff -> index -> stats --cardinality --stats-jsonl -> describegpt --all
excel (to CSV) -> index -> stats -> select -> tojsonl / qsv_to_parquet
,): default, no flag needed\t): use --delimiter '\t' or file extension .tsv;): use --delimiter ';' or file extension .ssvQSV_SNIFF_DELIMITER=1 environment variable--no-headers flag changes behavior significantly - most commands assume headers exist--output file.csv to write to file.sz (Snappy compressed) files transparentlycat rows requires same column order; use cat rowskey for different schemasdedup loads all data into memory and sorts internally; use --sorted flag if input is already sorted to enable streaming mode with constant memorysort loads entire file into memory; for huge files use sqlp with ORDER BYqsv_to_parquet for faster performance. Parquet works ONLY with sqlp and DuckDB — all other qsv commands need CSV/TSV/SSV inputUse qsv_search_tools to discover commands beyond the initially loaded core tools. There are 51+ qsv skill-based commands covering selection, filtering, transformation, aggregation, joining, validation, formatting, conversion, and more.
QSV_MCP_OPERATION_TIMEOUT_MS, max 30 min). Allow operations to run to completion.dedup, sort, reverse, table, transpose, pragmastat, and stats (with extended stats) load entire files into memory. For files >1GB, prefer extdedup/extsort via qsv_command.qsv_get_working_dir.qsv_frequency run to completion. Only fall back to qsv_sqlp with GROUP BY if the server timeout is exceeded.qsv_slice or qsv_sqlp with LIMIT to inspect subsets.