From astronomer-data
Generates detailed profiles of database tables including metadata, row counts, column statistics, cardinality analysis, sample data, and quality checks for completeness, uniqueness, and freshness.
npx claudepluginhub astronomer/agents --plugin astronomer-dataThis skill uses the workspace's default tool permissions.
Generate a comprehensive profile of a table that a new team member could use to understand the data.
Profiles tables or files (CSV, Excel, Parquet, JSON) to reveal shape, null rates, column distributions, top values, percentiles, data quality issues, and column categories.
References data quality dimensions with qsv checks and provides remediation decision tree for tabular CSV assessment and fixes.
Deep-profiles active datasets for schema structure, value distributions, temporal patterns, correlations, completeness gaps, and anomalies. Use after connecting datasets or before analysis.
Share bugs, ideas, or general feedback.
Generate a comprehensive profile of a table that a new team member could use to understand the data.
Query column metadata:
SELECT COLUMN_NAME, DATA_TYPE, COMMENT
FROM <database>.INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = '<schema>' AND TABLE_NAME = '<table>'
ORDER BY ORDINAL_POSITION
If the table name isn't fully qualified, search INFORMATION_SCHEMA.TABLES to locate it first.
Run via run_sql:
SELECT
COUNT(*) as total_rows,
COUNT(*) / 1000000.0 as millions_of_rows
FROM <table>
For each column, gather appropriate statistics based on data type:
SELECT
MIN(column_name) as min_val,
MAX(column_name) as max_val,
AVG(column_name) as avg_val,
STDDEV(column_name) as std_dev,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY column_name) as median,
SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count,
COUNT(DISTINCT column_name) as distinct_count
FROM <table>
SELECT
MIN(LEN(column_name)) as min_length,
MAX(LEN(column_name)) as max_length,
AVG(LEN(column_name)) as avg_length,
SUM(CASE WHEN column_name IS NULL OR column_name = '' THEN 1 ELSE 0 END) as empty_count,
COUNT(DISTINCT column_name) as distinct_count
FROM <table>
SELECT
MIN(column_name) as earliest,
MAX(column_name) as latest,
DATEDIFF('day', MIN(column_name), MAX(column_name)) as date_range_days,
SUM(CASE WHEN column_name IS NULL THEN 1 ELSE 0 END) as null_count
FROM <table>
For columns that look like categorical/dimension keys:
SELECT
column_name,
COUNT(*) as frequency,
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER(), 2) as percentage
FROM <table>
GROUP BY column_name
ORDER BY frequency DESC
LIMIT 20
This reveals:
Get representative rows:
SELECT *
FROM <table>
LIMIT 10
If the table is large and you want variety, sample from different time periods or categories.
Summarize quality across dimensions:
Provide a structured profile:
2-3 sentences describing what this table contains, who uses it, and how fresh it is.
| Column | Type | Nulls% | Distinct | Description |
|---|---|---|---|---|
| ... | ... | ... | ... | ... |
List any data quality concerns discovered.
3-5 useful queries for common questions about this data.