From data-exploration
Connects to dlt pipelines, profiles tables, scans schemas, plans charts with ibis and altair, and outputs analysis_plan.md artifacts for data exploration and analysis.
npx claudepluginhub dlt-hub/dlthub-ai-workbench --plugin data-explorationThis skill uses the workspace's default tool permissions.
Connect to a dlt pipeline, understand the data, and plan one chart at a time. Outputs a `<date>_<pipeline_name>_analysis_plan.md` artifact that `build-notebook` consumes. Use today's date in `YYYY-MM-DD` format (e.g., `2026-03-10`).
Queries and explores data loaded by dlt pipelines using Python, dlt dataset API, ReadableRelation, and ibis expressions. For table exploration, row counts, and ad-hoc reports.
Processes data analysis queries by loading workspace context, classifying question complexity from L1-L5, and generating charts, narratives, and metrics from datasets.
Interviews users to extract tribal knowledge about datasets/databases, generating reusable data context skills for documentation and analysis.
Share bugs, ideas, or general feedback.
Connect to a dlt pipeline, understand the data, and plan one chart at a time. Outputs a <date>_<pipeline_name>_analysis_plan.md artifact that build-notebook consumes. Use today's date in YYYY-MM-DD format (e.g., 2026-03-10).
Parse $ARGUMENTS:
pipeline-name (optional): the dlt pipeline name. If omitted, infer from session context. If ambiguous, ask the user and stop.question (optional, after --): a specific business question (e.g., -- what's the revenue trend?)Before discovery, check what's already available:
pipeline-name was passed via $ARGUMENTS or the session already has a pipeline context (e.g., arriving from rest-api-pipeline after validate-data or view-data), skip list_pipelines and go straight to list_tables.*_analysis_plan.md exists, skip to the iteration path (see "Iteration: existing analysis_plan.md" below)..duckdb file instead of a named pipeline, connect with an explicit destination: dlt.pipeline(pipeline_name="adhoc", destination=dlt.destinations.duckdb("<path>")). Then proceed normally — pipeline.dataset() works the same way.See workflow.md for high-intent vs low-intent definitions. One chart per invocation — if the user asks multiple questions, pick the first one and save the rest as [ ] pending questions.
If *_<pipeline_name>_analysis_plan.md already exists (glob for any date prefix; pick most recent): read it, skip Steps 1–2 entirely, and ask for the next question (or present remaining [ ] questions). Plan one chart, append as ## Chart N, hand off to build-notebook. See the full iteration loop in workflow.md.
Use the dlt MCP tools as the primary discovery path:
list_pipelines — discover available pipelines. If multiple exist and target is ambiguous, ask the user and stop.list_tables — enumerate tables in the selected pipeline.get_table_schema — fetch column names and types for relevant tables.If MCP tools are unavailable, fall back to Python:
import dlt
pipeline = dlt.attach("<pipeline_name>")
dataset = pipeline.dataset()
dataset.row_counts().df()
Follow data access patterns in references/dlt-relation-api.md.
Collect table names, column names, and column types. This is enough to plan a chart for a specific question. No row counts, no stats, no anomaly detection.
Use list_tables + get_table_schema MCP tools (or table.columns_schema in Python).
Profile all tables relevant to the user's domain:
get_row_counts MCP tool or dataset.row_counts().df().get_table_schema MCP tool or table.columns_schema.execute_sql_query MCP tool or .to_ibis() with group_by/aggregate.From the profiling evidence, infer 5-10 plain-language business questions the data can answer. Present as multi-select with table/column hints for each option. Always include an "Other" option for custom questions.
Avoid PII-flagged columns as chart dimensions or metrics.
Plan exactly one chart per invocation. Do not batch multiple charts — the iteration loop handles additional charts.
For the user's question (from argument or selection), decide:
If the columns needed for the question don't exist in any table:
## Data Gaps.Show the chart spec and ask for confirmation or adjustment. Use this format:
Chart: <title>
Type: <chart type>
X: <table.column> (<grain>)
Y: <aggregation>(table.column)
Source: <table>
"<one-line description>"
If "Adjust", ask one targeted follow-up — don't re-run the full interview.
After the spec is confirmed, generate the SQL query and altair chart code.
dataset("SELECT ... FROM table_name ...").df()dataset["table"].to_ibis()) only for complex joins or computed columnsget_table_schemareferences/dlt-relation-api.md for full API reference:T temporal, :Q quantitative, :N nominal, :O ordinal)Write or append to <date>_<pipeline_name>_analysis_plan.md (use today's date in YYYY-MM-DD format). See references/analysis-plan-format.md for the full template.
The file has these sections:
[x] charted, [ ] pendingFor high-intent path: Profile Summary may have minimal info (table/column names only). That's fine.
For low-intent path: Profile Summary includes row counts, anomaly notes, and PII flags.
Mark the charted question with [x] in the Questions list. Remaining [ ] questions are available for the next iteration.
After writing or appending to analysis_plan.md, you MUST propose building the notebook. Never end a session that produced a chart without this step.
Tell the user the plan was updated, then ask: "Ready to build the notebook — shall I invoke build-notebook?" If they agree, invoke it. If they decline, remind them they can run build-notebook later.
list_pipelines, or use explicit .duckdb path via dlt.pipeline(..., destination=dlt.destinations.duckdb("<path>")).uv run dlt ai status to diagnose. If the MCP server is not running or misconfigured, attempt to fix it (e.g., dlt ai init). Only fall back to Python path (dlt.attach / dlt.pipeline) if MCP cannot be restored.