Generates publication-quality charts from CSV/TSV/Excel files using Python and qsv for profiling, stats, frequencies, and queries.
From qsv-data-wranglingnpx claudepluginhub dathere/qsv --plugin qsv-data-wranglingThis skill is limited to using the following tools:
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Create publication-quality data visualizations from tabular data files. Uses qsv to profile and prepare data, then generates Python charts with best practices for clarity, accuracy, and design.
Cowork note: If relative paths don't resolve, call
qsv_get_working_dirandqsv_set_working_dirto sync the working directory.
Determine:
a. Index and detect: Run qsv_index, then qsv_sniff to detect format and encoding.
b. Understand structure: Run qsv_headers and qsv_count to get column names and row count.
c. Profile columns: Run qsv_stats with cardinality: true, stats_jsonl: true to understand types, ranges, and distributions. Read .stats.csv to inform chart design:
type → choose appropriate axis type (numeric, categorical, date)min/max → set axis rangescardinality → determine if column is categorical (low) or continuous (high)nullcount → note missing data that could affect the chartd. Check distributions: Run qsv_frequency with limit: 20 on columns you plan to plot — this reveals the actual values and whether grouping or filtering is needed.
e. Run moarstats for visualization hints: Run qsv_moarstats with advanced: true. Read the enriched .stats.csv for chart design decisions:
| Stats Column | Visualization Hint |
|---|---|
skewness / pearson_skewness | If |skewness| > 1, use log scale or split view; histogram will be lopsided on linear scale |
bimodality_coefficient | If >= 0.555, data is bimodal — overlay two distributions or use separate panels per group |
kurtosis | If > 3, heavy tails — add outlier annotations or use box plot alongside histogram |
outliers_percentage | If > 5%, annotate outliers in scatter plots; if > 10%, consider separate outlier panel |
q1, q3, iqr | Set box plot boundaries; whiskers at inner fences (q1 - 1.5*iqr, q3 + 1.5*iqr) |
cv | If CV > 100%, data is highly variable relative to mean — use normalized/percentage scale |
sparsity | If > 0.5, too many nulls to visualize meaningfully — warn user or show completeness bar |
mode, mode_count | If mode dominates (> 50% of rows), bar chart of top-N values is more informative than histogram |
f. Preview data: Run qsv_slice with len: 5 to see actual values and formats.
Use qsv to prepare visualization-ready data:
qsv_search or qsv_sqlp to subset rowsqsv_sqlp for GROUP BY, window functions, computed columnsqsv_select to keep only what's neededqsv_sqlp with ORDER BY for ordered categories or time seriesExport the prepared data to a CSV file for Python to read.
If the user didn't specify, recommend based on data and question:
| Data Relationship | Recommended Chart | How qsv Helps Choose |
|---|---|---|
| Trend over time | Line chart | stats shows Date/DateTime type |
| Comparison across categories | Bar chart (horizontal if many) | frequency shows category counts; cardinality < 20 |
| Part-to-whole composition | Stacked bar or area chart | frequency shows proportions; avoid pie unless < 6 categories |
| Distribution of values | Histogram or box plot | stats shows min/max/mean/stddev; moarstats shows kurtosis |
| Correlation between two variables | Scatter plot | stats shows two numeric columns |
| Ranking | Horizontal bar chart | frequency with --limit for top-N |
| Matrix of relationships | Heatmap | Two categorical columns with low cardinality |
| Two-variable comparison over time | Dual-axis line or grouped bar | Two numeric columns + one Date column |
Write Python code using matplotlib + seaborn (default) or plotly (if interactive requested):
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Load the prepared CSV
df = pd.read_csv('prepared_data.csv')
# Set professional style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("husl")
# Create figure with appropriate size
fig, ax = plt.subplots(figsize=(10, 6))
# [chart-specific code]
# Always include:
ax.set_title('Clear, Descriptive Title', fontsize=14, fontweight='bold')
ax.set_xlabel('X-Axis Label', fontsize=11)
ax.set_ylabel('Y-Axis Label', fontsize=11)
# Format numbers appropriately
# - Percentages: '45.2%' not '0.452'
# - Currency: '$1.2M' not '1200000'
# - Large numbers: '2.3K' or '1.5M' not '2300' or '1500000'
# Remove chart junk
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.tight_layout()
plt.savefig('chart_name.png', dpi=150, bbox_inches='tight')
plt.show()
Color:
Typography:
Layout:
Accuracy:
qsv_sqlp: SELECT date_col, SUM(value) as total
FROM data GROUP BY date_col ORDER BY date_col
qsv_frequency: --select category_col --limit 10
Or for aggregated values:
qsv_sqlp: SELECT category, SUM(amount) as total
FROM data GROUP BY category ORDER BY total DESC LIMIT 10
qsv_stats: Check min, max, mean, stddev, cardinality
qsv_moarstats: --advanced for kurtosis, bimodality
qsv_sqlp: SELECT FLOOR(value/10)*10 as bin, COUNT(*) as cnt
FROM data GROUP BY bin ORDER BY bin
qsv_select: Pick the two numeric columns
qsv_stats: Verify both are numeric types with reasonable ranges
qsv_sqlp: SELECT group_col, AVG(metric) as avg_metric, COUNT(*) as n
FROM data GROUP BY group_col ORDER BY avg_metric DESC
stats and frequency reveal the right chart type and catch data issues before plottingqsv_sqlp to aggregate before passing to Python — don't load millions of rows into pandas/data-clean before visualizing