Skill

pipeline-design

Design data pipelines — ETL/ELT flows, scheduling, error handling, and monitoring strategy

Install

npx claudepluginhub silviaare95/xari-plugins --plugin data-engineer

Tool Access

This skill uses the workspace's default tool permissions.

SKILL.md

Similar Skills

payload

11 files

Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.

payload

41.6k

data-engineer

Designs scalable batch/streaming data pipelines, warehouses, lakehouses using Spark, dbt, Airflow, Kafka/Flink, and cloud platforms like Snowflake, BigQuery, Databricks.

antigravity-bundle-data-engineering

31.8k

airflow-dag-patterns

1 file

Builds production Apache Airflow DAGs using best practices for operators, sensors, testing, and deployment. For data pipelines, workflow orchestration, and batch jobs.

antigravity-bundle-data-engineering

31.8k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 8, 2026

Actions

View Source View Plugin View on GitHub View README

Steps

Define the data contract:

Source: what system, format, volume, frequency
Destination: what system, expected schema, SLAs
Transformation: what changes between source and destination

Choose architecture:

Batch — scheduled, processes historical data, good for reports/analytics
Streaming — real-time, event-driven, good for live dashboards/alerts
Hybrid — batch for backfill, streaming for incremental

Design the pipeline stages:

[Source] → [Extract] → [Transform] → [Validate] → [Load] → [Destination] ↓ [Dead Letter Queue]

For each stage:

Input/output schema

Error handling (retry, skip, dead letter)

Idempotency strategy (how to handle re-runs)

Define scheduling & orchestration:

Cron schedule or trigger mechanism
Dependencies between pipelines
Backfill strategy
Tool: Airflow, Prefect, Cloud Scheduler, cron

Monitoring & alerting:

Row counts in vs out (detect data loss)
Schema drift detection
Freshness checks (is data arriving on time?)
Alert channels and escalation

Output Format

## Pipeline: <source> → <destination> ### Overview - **Type**: batch | streaming | hybrid - **Frequency**: every X hours | real-time | on trigger - **Volume**: ~N rows/day, ~X GB/month - **SLA**: data available within X hours of source update ### Stages | Stage | Input | Output | Error Strategy | |-------|-------|--------|---------------| | Extract | <source format> | raw JSON/CSV | retry 3x, alert | | Transform | raw | cleaned + typed | skip bad rows → DLQ | | Validate | cleaned | validated | reject → DLQ | | Load | validated | <destination table> | upsert, idempotent | ### Schema <Source and destination schemas with field mappings> ### Error Handling - **Retries**: <strategy> - **Dead Letter Queue**: <where bad records go> - **Alerting**: <when and how> ### Monitoring - <metric 1>: <threshold + alert> - <metric 2>: <threshold + alert>

Steps

Define the data contract:

Source: what system, format, volume, frequency
Destination: what system, expected schema, SLAs
Transformation: what changes between source and destination

Choose architecture:

Batch — scheduled, processes historical data, good for reports/analytics
Streaming — real-time, event-driven, good for live dashboards/alerts
Hybrid — batch for backfill, streaming for incremental

Design the pipeline stages:

[Source] → [Extract] → [Transform] → [Validate] → [Load] → [Destination] ↓ [Dead Letter Queue]

For each stage:

Input/output schema

Error handling (retry, skip, dead letter)

Idempotency strategy (how to handle re-runs)

Define scheduling & orchestration:

Cron schedule or trigger mechanism
Dependencies between pipelines
Backfill strategy
Tool: Airflow, Prefect, Cloud Scheduler, cron

Monitoring & alerting:

Row counts in vs out (detect data loss)
Schema drift detection
Freshness checks (is data arriving on time?)
Alert channels and escalation

Output Format

pipeline-design

Install

Tool Access

SKILL.md

Similar Skills

pipeline-design

Install

Tool Access

SKILL.md

Pipeline Design

Steps

Output Format

Constraints

Similar Skills

Pipeline Design

Steps

Output Format

Constraints