From rest-api-pipeline
Validates dlt pipeline-loaded schemas and data: mermaid diagrams, dashboard/MCP queries, fixes types (Decimal for money), nested structures, missing columns.
npx claudepluginhub dlt-hub/dlthub-ai-workbench --plugin rest-api-pipelineThis skill uses the workspace's default tool permissions.
After a successful pipeline load, verify the schema and data make sense. Fix data types, nested structures, and missing columns as needed.
Provides step-by-step guidance, code, and configurations for schema validation in data pipelines covering ETL, Airflow, Spark, streaming, and data processing. Activates on schema validator mentions.
Connects to dlt pipelines, profiles tables, scans schemas, plans charts with ibis and altair, and outputs analysis_plan.md artifacts for data exploration and analysis.
Debugs and inspects dlt pipelines after runs, checking traces, load packages, schemas, data, and errors like missing credentials or failed jobs. Use post-execution.
Share bugs, ideas, or general feedback.
After a successful pipeline load, verify the schema and data make sense. Fix data types, nested structures, and missing columns as needed.
Parse $ARGUMENTS:
pipeline-name (optional): the dlt pipeline name. If omitted, infer from session context. If ambiguous, ask the user and stop.hints (optional, after --): specific validation concernsdlt pipeline <pipeline_name> schema --format mermaid
Show the mermaid diagram to the user. This gives a quick overview of tables, columns, types, and relationships (parent/child).
Tell the user to run Workspace Dashboard:
dlt pipeline <pipeline_name> show
This opens a browser with table schemas, row counts, and sample data.
You have mcp with a right set of tools available
Ask the user if the schema and data look right. Common issues to address:
Use processing_steps in the resource config to transform data before loading. Available steps: map, filter, yield_map.
"processing_steps": [
{"map": lambda item: {**item, "amount": Decimal(item["amount"])}},
]
IMPORTANT: NEVER convert monetary amounts or precision-sensitive values to float. Always use Decimal.
dlt auto-unnests nested arrays into child tables (e.g., results inside a response becomes <resource>__results). This is often fine for analytics. If the user wants a flat structure, use yield_map to flatten, or adjust data_selector to point deeper into the response.
Columns that are all-null on first load won't have inferred types. Options:
columns hints to the resource config: "columns": {"field": {"data_type": "text"}}group_by or other API params to populate the columnsRe-run the pipeline after changes (dev_mode gives a fresh dataset each time). Use debug-pipeline to inspect traces and load packages after each run. Inspect again with MCP or dlt pipeline <name> schema --format mermaid. Repeat until the user is happy with the schema.
new-endpoint for more resources, view-data for querying, or the data-exploration toolkit for interactive notebooks and reportsdebug-pipeline