From data-engineer
Design data pipelines — ETL/ELT flows, scheduling, error handling, and monitoring strategy
npx claudepluginhub silviaare95/xari-plugins --plugin data-engineerThis skill uses the workspace's default tool permissions.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Designs scalable batch/streaming data pipelines, warehouses, lakehouses using Spark, dbt, Airflow, Kafka/Flink, and cloud platforms like Snowflake, BigQuery, Databricks.
Builds production Apache Airflow DAGs using best practices for operators, sensors, testing, and deployment. For data pipelines, workflow orchestration, and batch jobs.
Design a data pipeline for: $ARGUMENTS
Define the data contract:
Choose architecture:
Design the pipeline stages:
[Source] → [Extract] → [Transform] → [Validate] → [Load] → [Destination]
↓
[Dead Letter Queue]
For each stage:
Define scheduling & orchestration:
Monitoring & alerting:
## Pipeline: <source> → <destination>
### Overview
- **Type**: batch | streaming | hybrid
- **Frequency**: every X hours | real-time | on trigger
- **Volume**: ~N rows/day, ~X GB/month
- **SLA**: data available within X hours of source update
### Stages
| Stage | Input | Output | Error Strategy |
|-------|-------|--------|---------------|
| Extract | <source format> | raw JSON/CSV | retry 3x, alert |
| Transform | raw | cleaned + typed | skip bad rows → DLQ |
| Validate | cleaned | validated | reject → DLQ |
| Load | validated | <destination table> | upsert, idempotent |
### Schema
<Source and destination schemas with field mappings>
### Error Handling
- **Retries**: <strategy>
- **Dead Letter Queue**: <where bad records go>
- **Alerting**: <when and how>
### Monitoring
- <metric 1>: <threshold + alert>
- <metric 2>: <threshold + alert>