Skill

csv-query

Runs SQL queries on CSV/TSV/Excel files using Polars engine via qsv_sqlp. Provides stats, frequency analysis, indexing, search, select, and Parquet export for efficient data exploration.

data-engineering

database

npx claudepluginhub dathere/qsv --plugin qsv-data-wrangling

Popularity

Parent stars

3,643

Parent forks

105

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/qsv-data-wrangling:csv-query

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

mcp__qsv__qsv_sniffmcp__qsv__qsv_countmcp__qsv__qsv_headersmcp__qsv__qsv_indexmcp__qsv__qsv_statsmcp__qsv__qsv_frequencymcp__qsv__qsv_searchmcp__qsv__qsv_selectmcp__qsv__qsv_sqlpmcp__qsv__qsv_commandmcp__qsv__qsv_to_parquetmcp__qsv__qsv_list_filesmcp__qsv__qsv_search_toolsmcp__qsv__qsv_get_working_dirmcp__qsv__qsv_set_working_dir

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Query tabular data files using SQL via the Polars-powered `sqlp` command.

SKILL.md

130 lines · ~1.5k tokens

Similar Skills

csvkit

Manipulates CSV data with csvlook, csvsql, csvjoin, csvstat. Streams records lazily; handles datasets larger than RAM. Invoke via csvlook.

hatch3r

qsv-performance

3.6k

Accelerates qsv CSV processing with index files, stats cache, Polars engine, and Parquet conversion for large files and smart commands.

qsv-data-wrangling

csv-processor

Parse, transform, clean, and analyze CSV files: auto-detect formats, filter/sort/merge/pivot, generate stats/outliers, with Python (pandas) and JavaScript examples.

devkit

Stats

LanguageRust

Parent stars3,643

Parent forks105

MaintenanceExcellent

Last CommitApr 6, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

CSV Query

Query tabular data files using SQL via the Polars-powered sqlp command.

Cowork note: If relative paths don't resolve, call qsv_get_working_dir and qsv_set_working_dir to sync the working directory.

Decision Tree

Is the query simple (single column filter, basic select)?

Yes -> Consider select + search for simpler operations
No -> Use sqlp for full SQL support

Does the query involve joins, GROUP BY, window functions, or complex expressions?

Yes -> Use sqlp (Polars SQL engine)

Is the CSV file very large (> 10MB)?

Yes -> Consider converting to Parquet with qsv_to_parquet for faster repeated queries. Note: sqlp can also query CSV files of any size directly.

Steps

Prepare the file: Run qsv_index and qsv_stats with cardinality: true, stats_jsonl: true to create index and stats cache.
Read the stats cache: Read <FILESTEM>.stats.csv (e.g., data.stats.csv for data.csv) to understand column metadata before writing SQL. This is the most important step for writing efficient queries.
Run frequency on key columns: For columns you plan to GROUP BY, filter on, or join on, run qsv_frequency to see actual value distributions. This reveals the best filter values and whether a GROUP BY will produce a manageable result set.
Write and run SQL: Use qsv_sqlp with the SQL query informed by stats and frequency data. The table name in SQL is the filename stem (e.g., data.csv -> SELECT * FROM data). For Parquet files, use read_parquet('data.parquet') as the table source instead.
Refine if needed: Check results and adjust the query.

Using Stats to Write Better SQL

After reading the .stats.csv cache, use these columns to inform your SQL:

Stats Column	How to Use in SQL
`type`	Use correct casts and comparisons — don't quote integers, use date functions for Date/DateTime columns
`min` / `max`	Write precise WHERE clauses using actual data range (e.g., `WHERE price BETWEEN 10.5 AND 999.99` instead of arbitrary bounds)
`cardinality`	Estimate GROUP BY result size — low cardinality (< 100) is fast; high cardinality (> 10K) may need LIMIT or a different approach
`nullcount`	Only add COALESCE or IS NOT NULL where `nullcount` > 0 — skip null handling for columns with zero nulls
`sort_order`	Skip ORDER BY if data is already sorted on that column (sort_order = "Ascending"/"Descending")
`mean` / `stddev`	Write outlier filters: `WHERE col BETWEEN mean - 3stddev AND mean + 3stddev`
`median` / `q1` / `q3`	For skewed data (when mean and median diverge), use quartile-based ranges: `WHERE col BETWEEN q1 AND q3` instead of mean ± stddev
`skewness`	If skewness > 1 or < -1, prefer median/quartile-based filters over mean-based ones
`cv`	High CV (> 100%) signals high relative variability — add LIMIT to GROUP BY queries and consider binning continuous values
`outliers_percentage`	If > 5%, consider excluding outliers before aggregation: `WHERE col BETWEEN lower_inner_fence AND upper_inner_fence`
`sparsity`	Columns with sparsity > 0.5 are mostly null — avoid using them as join keys or GROUP BY columns

Using Frequency for Filter Values

Run qsv_frequency --select col --limit 20 before writing WHERE clauses on categorical columns:

Pick selective filters: If frequency shows "active" has 90% of rows, filtering on WHERE status = 'active' is wasteful — filter on the rare values instead
Validate expected values: If you plan WHERE category IN ('A','B','C'), check frequency first to confirm those values exist and see if you're missing any
Avoid GROUP BY on high-cardinality columns: If frequency shows thousands of unique values, GROUP BY will produce a huge result — add LIMIT or aggregate differently

SQL Syntax Guide

The sqlp command uses Polars SQL dialect:

-- Basic select
SELECT col1, col2 FROM data WHERE col1 > 100

-- Aggregation
SELECT category, COUNT(*) as cnt, AVG(price) as avg_price
FROM data GROUP BY category ORDER BY cnt DESC

-- Window functions
SELECT *, ROW_NUMBER() OVER (PARTITION BY dept ORDER BY salary DESC) as rank
FROM employees

-- String operations
SELECT * FROM data WHERE col1 LIKE '%pattern%'

-- Date operations
SELECT *, EXTRACT(YEAR FROM date_col) as year FROM data

-- Multiple files (join)
SELECT a.*, b.name FROM file1 a JOIN file2 b ON a.id = b.id

-- CASE expressions
SELECT *, CASE WHEN amount > 1000 THEN 'high' ELSE 'low' END as tier FROM data

Table Naming Convention

File: sales_2024.csv -> Table: sales_2024
File: my-data.csv -> Table: "my-data" (quote if contains special chars)
Multiple files: each file is a separate table

Notes

sqlp uses the Polars engine - some PostgreSQL-specific syntax may not be supported
For very complex queries that fail, suggest DuckDB as an alternative
The stats cache helps Polars choose optimal data types for columns
Results go to stdout by default; use --output file.csv for large result sets
Column names are case-sensitive in SQL queries
Use LIMIT to preview large result sets before running full queries
sqlp can query multiple CSV files in a single SQL statement (useful for joins)

csv-query

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

csv-query

Popularity

Invocation

Tool Access

Context Preview

SKILL.md

CSV Query

Decision Tree

Steps

Using Stats to Write Better SQL

Using Frequency for Filter Values

SQL Syntax Guide

Table Naming Convention

Notes

Similar Skills

Help us improve

CSV Query

Decision Tree

Steps

Using Stats to Write Better SQL

Using Frequency for Filter Values

SQL Syntax Guide

Table Naming Convention

Notes