From dts
Explore data with rich context, visualizations, and data quality analysis
npx claudepluginhub big-time-data/homebrew-dts --plugin dts<model_or_topic> [--deep] [--quality] [--issue <issue-id>]# Explore Data Command A richer version of `/query` designed for open-ended data exploration. Automatically includes data quality checks and visualization suggestions. ## Arguments - **$ARGUMENTS**: The model name, topic, or question to explore - **--deep**: Include full lineage analysis and related models - **--quality**: Run comprehensive data quality checks - **--issue**: Link this workflow to an issue tracker ID (e.g., `data-tools-123`). When provided, all observer calls include this reference for traceability. ## Environment The agent determines which environment(s) to query bas...
A richer version of /query designed for open-ended data exploration. Automatically includes data quality checks and visualization suggestions.
data-tools-123).
When provided, all observer calls include this reference for traceability.The agent determines which environment(s) to query based on the task:
Schema context is established by dts-prime at session start. Always use fully-qualified table names (e.g., ANALYTICS.MARTS.orders).
/explore-data orders # Basic exploration of orders model
/explore-data orders --deep # Include upstream/downstream analysis
/explore-data "customer behavior" # Explore a topic across models
/explore-data inventory --quality # Focus on data quality
/explore-data orders --issue data-tools-456 # Link to issue tracker
This command follows a structured exploration approach:
Understand what data is available:
# Find relevant models
dbt_list(selector: "<topic>")
# Get lineage context
dbt_lineage(node: "<model>", direction: "both")
# Get schema details
db_get_schemata(level: "columns", table: "<model>")
Record your initial understanding:
observer_context(
context_type: "reasoning",
content: "Exploring $ARGUMENTS. Available models: [...]. Starting with [model] because [reason].",
tags: ["exploration", "<topic>"],
issue_ref: "<issue-id>" # If --issue provided
)
Always start with high-level metrics:
Volume and Recency:
SELECT
COUNT(*) as total_rows,
MIN(created_at) as earliest_record,
MAX(created_at) as latest_record,
MAX(updated_at) as last_update
FROM <model>
Key Distributions:
SELECT
<categorical_column>,
COUNT(*) as count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER(), 1) as pct
FROM <model>
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
Always include basic quality checks:
Null Analysis:
SELECT
COUNT(*) as total,
COUNT(<column_1>) as has_col1,
COUNT(<column_2>) as has_col2,
-- etc for key columns
FROM <model>
Recency Check:
SELECT
CASE
WHEN MAX(created_at) < CURRENT_DATE - INTERVAL '7 days' THEN 'stale'
WHEN MAX(created_at) < CURRENT_DATE - INTERVAL '1 day' THEN 'delayed'
ELSE 'current'
END as freshness_status
FROM <model>
When deep exploration is requested:
Upstream Dependencies:
dbt_lineage(node: "<model>", direction: "upstream")
# Then explore each upstream model for data quality issues
Downstream Impact:
dbt_lineage(node: "<model>", direction: "downstream")
# Identify which downstream models would be affected by issues
Cross-Model Joins:
-- Check join integrity with parent tables
SELECT
COUNT(*) as total,
COUNT(b.id) as matched,
COUNT(*) - COUNT(b.id) as orphaned
FROM <model> a
LEFT JOIN <parent_model> b ON a.<fk_column> = b.id
When quality focus is requested:
Duplicate Detection:
SELECT
<natural_key_columns>,
COUNT(*) as occurrences
FROM <model>
GROUP BY <natural_key_columns>
HAVING COUNT(*) > 1
ORDER BY occurrences DESC
LIMIT 20
Value Range Validation:
SELECT
'<column>' as column_name,
MIN(<column>) as min_val,
MAX(<column>) as max_val,
AVG(<column>) as avg_val,
COUNT(CASE WHEN <column> < 0 THEN 1 END) as negative_count,
COUNT(CASE WHEN <column> IS NULL THEN 1 END) as null_count
FROM <model>
Referential Integrity:
SELECT
'<foreign_key>' as fk_column,
COUNT(*) as total,
COUNT(CASE WHEN <foreign_key> IS NOT NULL
AND NOT EXISTS (SELECT 1 FROM <ref_table> WHERE id = <foreign_key>)
THEN 1 END) as orphaned
FROM <model>
Exploration results should be comprehensive:
## Data Exploration: $ARGUMENTS
**Issue**: `<issue-id>` (if --issue provided)
### Model Overview
- **Table**: <schema.table>
- **Rows**: <count>
- **Date Range**: <earliest> to <latest>
- **Freshness**: <status>
### Key Metrics
| Metric | Value |
|--------|-------|
| Total records | X |
| Unique <entity> | Y |
| Date range | A to B |
### Data Quality Summary
| Check | Status | Details |
|-------|--------|---------|
| Completeness | <status> | <X>% of records have all required fields |
| Freshness | <status> | Last update: <date> |
| Duplicates | <status> | <N> potential duplicates found |
| Orphans | <status> | <N> records missing foreign keys |
### Distribution Analysis
<key categorical breakdowns>
### Visualizations
- [Chart 1]: <description>
- [Chart 2]: <description>
### Lineage Context (if --deep)
- **Upstream**: <models>
- **Downstream**: <models>
### Issues Found
- <issue 1>
- <issue 2>
### Recommendations
- <recommendation 1>
- <recommendation 2>
Always suggest visualizations for:
{"chart_type": "line", "x_column": "date", "y_column": "count", "title": "Volume Over Time"}
{"chart_type": "bar", "x_column": "category", "y_column": "count", "title": "Distribution by Category"}
{"chart_type": "table", "columns": ["check", "status", "value"], "title": "Data Quality Summary"}
| Aspect | /query | /explore-data |
|---|---|---|
| Focus | Answer specific question | Open-ended exploration |
| Quality checks | Only if relevant | Always included |
| Visualizations | When helpful | Always suggested |
| Lineage | If needed | Included with --deep |
| Output | Targeted findings | Comprehensive overview |