Help us improve
Share bugs, ideas, or general feedback.
From data-architecture
Design batch and streaming data pipelines. Plan ingestion, transformation, quality checks, and failure recovery. Use when building ETL/ELT systems or data infrastructure.
npx claudepluginhub sethdford/claude-skills --plugin architect-data-architectureHow this skill is triggered — by the user, by Claude, or both
Slash command
/data-architecture:data-pipeline-designThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Design robust, maintainable data pipelines that reliably move, transform, and validate data at scale.
Designs scalable data pipelines for batch and streaming processing with Airflow, Prefect, dbt, Kafka, Spark, Delta Lake, and Great Expectations. Guides architecture, ingestion, orchestration, transformation, quality, and monitoring.
Designs data pipelines and ETL processes covering extraction, transformation, loading, data quality checks, orchestration, and patterns for batch, streaming, CDC, ELT. Useful for building pipelines, data flows, syncing, or moving data between systems.
Designs data pipeline architectures for batch ETL, streaming, or hybrid scenarios including tech stacks, ASCII diagrams, data quality strategies, and cost analysis. Useful for real-time processing, BI reporting, or migrations.
Share bugs, ideas, or general feedback.
Design robust, maintainable data pipelines that reliably move, transform, and validate data at scale.
You are designing data pipelines (batch or streaming). Plan data flow, transformations, quality gates, failure recovery, and monitoring. Read source systems, target requirements, latency expectations, and volume projections.
Based on modern data engineering practices (Spark, Airflow, Kafka, Beam):
Choose Processing Model: Batch (daily jobs?) or streaming (realtime features?)? Hybrid (Lambda: batch + streaming for both speed and accuracy)? Consider latency SLA and cost.
Design Data Stages: Raw ingestion (as-is from source) → Bronze. Cleansing and normalization → Silver. Business logic and enrichment → Gold. This layered medallion architecture separates concerns.
Implement Quality Gates: Validation at each stage. Fail pipeline if data quality drops. Track anomalies: unexpected null rates, value distributions, cardinality changes.
Handle Failures and Recovery: Idempotent transformations allow safe retries. Checkpoint state for streaming pipelines; resume from last checkpoint on failure. Use dead-letter queues for unparseable records.
Plan Monitoring and Alerting: Track freshness (when was last successful run?), latency (time from source to sink), volume (record counts by stage), error rates. Alert on anomalies and SLA misses.