Design batch and streaming data pipelines. Plan ingestion, transformation, quality checks, and failure recovery. Use when building ETL/ELT systems or data infrastructure.
From data-architecturenpx claudepluginhub sethdford/claude-skills --plugin architect-data-architectureThis skill uses the workspace's default tool permissions.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Design robust, maintainable data pipelines that reliably move, transform, and validate data at scale.
You are designing data pipelines (batch or streaming). Plan data flow, transformations, quality gates, failure recovery, and monitoring. Read source systems, target requirements, latency expectations, and volume projections.
Based on modern data engineering practices (Spark, Airflow, Kafka, Beam):
Choose Processing Model: Batch (daily jobs?) or streaming (realtime features?)? Hybrid (Lambda: batch + streaming for both speed and accuracy)? Consider latency SLA and cost.
Design Data Stages: Raw ingestion (as-is from source) → Bronze. Cleansing and normalization → Silver. Business logic and enrichment → Gold. This layered medallion architecture separates concerns.
Implement Quality Gates: Validation at each stage. Fail pipeline if data quality drops. Track anomalies: unexpected null rates, value distributions, cardinality changes.
Handle Failures and Recovery: Idempotent transformations allow safe retries. Checkpoint state for streaming pipelines; resume from last checkpoint on failure. Use dead-letter queues for unparseable records.
Plan Monitoring and Alerting: Track freshness (when was last successful run?), latency (time from source to sink), volume (record counts by stage), error rates. Alert on anomalies and SLA misses.