Skill

etl-patterns

Install
1
Install the plugin
$
npx claudepluginhub majesticlabs-dev/majestic-marketplace --plugin majestic-data

Want just this skill?

Add to a custom plugin, then install with one command.

Description

Production ETL patterns orchestrator. Routes to core reliability patterns and incremental load strategies.

Tool Access

This skill is limited to using the following tools:

Read Write Edit Grep Glob Bash
Skill Content

ETL Patterns

Orchestrator for production-grade Extract-Transform-Load patterns.

Skill Routing

NeedSkillContent
Reliability patternsetl-core-patternsIdempotency, checkpointing, error handling, chunking, retry, logging
Load strategiesetl-incremental-patternsBackfill, timestamp-based, CDC, pipeline orchestration

Pattern Selection Guide

By Reliability Need

NeedPatternSkill
Repeatable runsIdempotencyetl-core-patterns
Resume after failureCheckpointingetl-core-patterns
Handle bad recordsError handling + DLQetl-core-patterns
Memory managementChunked processingetl-core-patterns
Network resilienceRetry with backoffetl-core-patterns
ObservabilityStructured loggingetl-core-patterns

By Load Strategy

ScenarioPatternSkill
Small tables (<100K)Full refreshetl-incremental-patterns
Large tablesTimestamp incrementaletl-incremental-patterns
Real-time syncCDC eventsetl-incremental-patterns
Historical migrationParallel backfilletl-incremental-patterns
Zero-downtime refreshSwap patternetl-incremental-patterns
Multi-step pipelinesPipeline orchestrationetl-incremental-patterns

Quick Reference

Idempotency Options

# Small datasets: Delete-then-insert
# Large datasets: UPSERT on conflict
# Change detection: Row hash comparison

Load Strategy Decision

Is table < 100K rows?
  → Full refresh

Has reliable timestamp column?
  → Timestamp incremental

Source supports CDC?
  → CDC event processing

Need zero downtime?
  → Swap pattern (temp table → rename)

One-time historical load?
  → Parallel backfill with date ranges

Common Pipeline Structure

# 1. Setup
checkpoint = Checkpoint('.etl_checkpoint.json')
processor = ETLProcessor()

# 2. Extract (with incremental)
df = incremental_by_timestamp(source_table, 'updated_at')

# 3. Transform (with error handling)
transformed = processor.process_batch(df.to_dict('records'))

# 4. Load (with idempotency)
upsert_records(pd.DataFrame(transformed))

# 5. Checkpoint
checkpoint.set_last_processed('sync', df['updated_at'].max())

# 6. Handle failures
processor.save_failures('failures/')

Related Skills

  • data-validation - Validate data quality during ETL
  • data-quality - Monitor data quality metrics
  • pandas-coder - DataFrame transformations
Stats
Stars30
Forks6
Last CommitFeb 15, 2026
Actions

Similar Skills