From claude-data-analyst
Generate a data dictionary for a dataset, combining automatic profiling with the user's description of what the data represents. Use when the user wants documentation of columns — names, types, semantic meaning, units, allowed values, and nullability — for a CSV/Parquet/Excel file.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin claude-data-analystThis skill uses the workspace's default tool permissions.
Produce a data dictionary by merging schema inspection with the user's semantic description of the dataset.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Share bugs, ideas, or general feedback.
Produce a data dictionary by merging schema inspection with the user's semantic description of the dataset.
markdown default, csv, or json).duckdb -c "DESCRIBE SELECT * FROM '<file>'" — fast schema + inferred types.csvstat — null counts, uniqueness, min/max per column.uv run --with pandas python -c '...' — for dtype coercion and sampling.For each column, collect:
Parse the user's description and map sentences to columns. For each column, fill:
pii-flag heuristics)If a column isn't covered by the user's description, mark the Description field as [NEEDS REVIEW] rather than guessing, and list these at the end for user confirmation.
At the top of the dictionary:
Default — write <dataset>-dictionary.md:
# Data Dictionary — <dataset name>
## Overview
...
## Columns
### `column_name`
- **Type**: ...
- **Description**: ...
- **Unit**: ...
- **Nullable**: ...
- **Allowed values**: ...
- **Sample**: ...
- **Notes**: ...
For csv output, flatten to one row per column with standard dictionary columns. For json, emit a structured schema object compatible with JSON Schema / Frictionless Data.
End with a [NEEDS REVIEW] section listing columns the user should clarify.