From data-engineering
Designs scalable data pipeline architectures for batch and streaming processing based on requirements. Covers ETL/ELT patterns, Airflow/Prefect orchestration, dbt transformations, data quality, and Delta Lake/Iceberg storage.
npx claudepluginhub arogyareddy/https-github.com-wshobson-agents --plugin data-engineering# Data Pipeline Architecture You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing. ## Requirements $ARGUMENTS ## Core Capabilities - Design ETL/ELT, Lambda, Kappa, and Lakehouse architectures - Implement batch and streaming data ingestion - Build workflow orchestration with Airflow/Prefect - Transform data using dbt and Spark - Manage Delta Lake/Iceberg storage with ACID transactions - Implement data quality frameworks (Great Expectations, dbt tests) - Monitor pipelines with CloudWatch/...
/data-pipelineYou are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
/data-pipelineDesign and validate financial data ingestion, cleaning, and transformation pipelines
/data-pipelineYou are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
/data-pipelineDesigns scalable data pipeline architectures for batch and streaming processing based on requirements. Covers ETL/ELT patterns, Airflow/Prefect orchestration, dbt transformations, data quality, and Delta Lake/Iceberg storage.
/data-pipelineDesigns scalable data pipeline architectures for batch and streaming processing based on requirements. Covers ETL/ELT patterns, Airflow/Prefect orchestration, dbt transformations, data quality, and Delta Lake/Iceberg storage.
/data-pipelineDesigns scalable data pipeline architectures for batch and streaming processing based on requirements. Covers ETL/ELT patterns, Airflow/Prefect orchestration, dbt transformations, data quality, and Delta Lake/Iceberg storage.
You are a data pipeline architecture expert specializing in scalable, reliable, and cost-effective data pipelines for batch and streaming data processing.
$ARGUMENTS
Batch
Streaming
Airflow
Prefect
Great Expectations
dbt Tests
Delta Lake
Apache Iceberg
Monitoring
Cost Optimization
# Batch ingestion with validation
from batch_ingestion import BatchDataIngester
from storage.delta_lake_manager import DeltaLakeManager
from data_quality.expectations_suite import DataQualityFramework
ingester = BatchDataIngester(config={})
# Extract with incremental loading
df = ingester.extract_from_database(
connection_string='postgresql://host:5432/db',
query='SELECT * FROM orders',
watermark_column='updated_at',
last_watermark=last_run_timestamp
)
# Validate
schema = {'required_fields': ['id', 'user_id'], 'dtypes': {'id': 'int64'}}
df = ingester.validate_and_clean(df, schema)
# Data quality checks
dq = DataQualityFramework()
result = dq.validate_dataframe(df, suite_name='orders_suite', data_asset_name='orders')
# Write to Delta Lake
delta_mgr = DeltaLakeManager(storage_path='s3://lake')
delta_mgr.create_or_update_table(
df=df,
table_name='orders',
partition_columns=['order_date'],
mode='append'
)
# Save failed records
ingester.save_dead_letter_queue('s3://lake/dlq/orders')