Data pipeline and ETL -- extraction, transformation, loading, data quality, orchestration.
From godmodenpx claudepluginhub arbazkhan971/godmodeThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
/godmode:pipeline, "build a data pipeline", "ETL"ls dags/ dbt_project.yml dagster.yaml 2>/dev/null
grep -r "airflow\|dagster\|prefect\|kafka" \
requirements.txt package.json 2>/dev/null
Name: <pipeline>
Type: batch | streaming | micro-batch | CDC
Schedule: cron | event-triggered | continuous
SLA: <max latency>
Sources: <name>: <type> (<format>, <volume/day>)
Transforms: 1. <step> (input -> output)
Destinations: <target>: <type> (<write method>)
Idempotent: yes/no
Error handling: skip | fail | dead-letter | retry
IF data changes hourly: batch with cron. IF sub-second latency needed: streaming (Kafka). IF already using PostgreSQL: CDC with Debezium.
Extraction: track watermarks, retry with backoff, log metrics. Patterns: API pagination with rate limit, DB incremental by updated_at, file dedup.
Transformation: pure functions only -- no DB calls,
no side effects. Composable via .pipe().
Loading strategies:
Every pipeline needs these (not optional):
IF quality < 95%: alert and investigate. IF count change > 50%: block load and alert.
Structured logging at every stage. Metrics: duration_seconds, rows_processed/rejected, last_success, data_freshness, quality_score. Alert: failure, 2x duration, quality < 95%, no data > 2 hours.
DLQ for bad records. Retry with exponential backoff. Checkpoint and resume for large batches. Circuit breaker if source fails N times.
Append .godmode/pipeline-results.tsv:
timestamp stage source target records_in records_out rejected quality_pct status
KEEP if: pipeline runs end-to-end AND quality checks
pass AND SLA met.
DISCARD if: quality fails OR errors OR SLA exceeded.
STOP when ALL of:
- Pipeline runs end-to-end with zero errors
- Quality checks validate all stages
- SLA met
- Backfill tested
On failure: git reset --hard HEAD~1. Never pause.
| Failure | Action |
|---|---|
| Schema changed | Fail loudly, update contract |
| Duplicates | Use upsert, add dedup step |
| DLQ growing | Investigate rejection reason |
| Exceeds SLA | Profile stages, parallelize |
| Connection timeout | Retry with backoff, check pool |