Accelerates qsv CSV processing with index files, stats cache, Polars engine, and Parquet conversion for large files and smart commands.
From qsv-data-wranglingnpx claudepluginhub dathere/qsv --plugin qsv-data-wranglingThis skill uses the workspace's default tool permissions.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
.csv.idx)Created by: qsv index
Used by: count, slice, sample, split, stats, frequency, schema, and others marked with š
| Benefit | Without Index | With Index |
|---|---|---|
| Row count | Scan entire file | Instant (stored in index) |
| Random access | Sequential scan | O(1) lookup |
| Multithreaded | Not possible | Enabled for many commands |
| Slicing | Read from start | Jump to position |
Rule: Always run index first if you'll run 2+ commands on the same file.
Auto-indexing: The MCP server auto-indexes files > 10MB.
.stats.csv + .stats.csv.data.jsonl)Created by: qsv stats --cardinality --stats-jsonl
Used by: frequency, schema, tojsonl, sqlp, joinp, pivotp, diff, sample (smart commands)
| Smart Command | What It Uses from Cache |
|---|---|
frequency | Cardinality to skip all-unique columns |
schema | Data types for JSON Schema generation |
sqlp | Column types for Polars optimization |
joinp | Cardinality for optimal join order |
pivotp | Cardinality to estimate output width |
diff | Column types for comparison |
Rule: Run stats --cardinality --stats-jsonl before using any smart command.
Auto-caching: The MCP server auto-adds --stats-jsonl to stats commands.
Commands: sqlp, joinp, pivotp, count (with --polars-len), schema (with --polars)
| Benefit | Standard (csv crate) | Polars Engine |
|---|---|---|
| Processing model | Row-by-row streaming | Vectorized columnar |
| Memory | Streaming (constant) | Columnar (efficient) |
| Parallelism | Single-threaded | Multi-threaded |
| Large files | Limited by memory | Larger-than-memory |
| SQL support | N/A | Full SQL dialect |
Rule: Use Polars commands (sqlp, joinp, pivotp) for files > 100MB or complex queries.
For repeated SQL queries on large CSV (> 10MB), consider converting to Parquet with qsv_to_parquet. Parquet is a columnar format that speeds up repeated SQL queries in sqlp. Use read_parquet('file.parquet') as the table source. DuckDB is the preferred engine for Parquet queries; sqlp with SKIP_INPUT mode also works. Note: sqlp can query CSV of any size directly ā Parquet is an optimization for repeated queries, not a requirement. Parquet works ONLY with sqlp and DuckDB ā all other qsv commands require CSV/TSV/SSV input.
dedup, reverse, sort, stats (with extended stats), table, transpose
frequency, join, schema, tojsonl
Everything else - select, search, slice, replace, count, etc.
File size?
āāā < 10MB: Any command works fine
āāā 10MB - 100MB:
ā āāā Always: index first
ā āāā Repeated SQL: consider Parquet with qsv_to_parquet
ā āāā Prefer: streaming commands
ā āāā OK: memory-intensive if < available RAM
āāā 100MB - 1GB:
ā āāā Always: index + stats cache first
ā āāā Repeated SQL: consider Parquet with qsv_to_parquet
ā āāā Prefer: Polars commands (sqlp, joinp, pivotp)
ā āāā Avoid: sort, reverse, table (load entire file)
ā āāā Alternative: sqlp with ORDER BY LIMIT instead of sort
āāā > 1GB:
āāā Must: index + stats cache
āāā Repeated SQL: convert to Parquet with qsv_to_parquet
āāā Must: Polars commands only for joins/queries
āāā Avoid: all 𤯠commands
āāā Consider: split into chunks, process, cat rows
| Tip | Why |
|---|---|
Use --output file.csv | Avoids stdout buffering overhead |
Use count before stats | Fast row count for progress bars |
Use select early in pipeline | Reduce columns = faster processing |
Use --no-headers only when needed | Header detection is cheap |
Use slice --len N for previews | Don't read entire file to inspect |
Prefer joinp over join | Polars engine is significantly faster |
Use frequency --limit N | Don't compute all unique values |
Use stats --cardinality | Enables smart optimizations downstream |
The MCP server limits concurrent qsv operations (default: 1). For multiple independent files, the agent can issue separate tool calls.
QSV_MCP_OPERATION_TIMEOUT_MS)