Transform Multiple Tables to Staging (Batch Mode)

⚠️ CRITICAL: This command enables parallel processing for 3x-10x faster transformations

I'll help you transform multiple database tables to staging format using parallel sub-agent execution for maximum performance.

Gathering Required Information

FIRST, use the AskUserQuestion tool to interactively collect all required parameters.

Call AskUserQuestion with these questions:

{
  "questions": [
    {
      "question": "Which tables do you want to transform? (Comma-separated, e.g., table1, table2 OR db.table1, db.table2)",
      "header": "Tables",
      "multiSelect": false,
      "options": [
        {
          "label": "Table list",
          "description": "I'll provide comma-separated table names"
        }
      ]
    },
    {
      "question": "Which SQL engine strategy should be used for these tables?",
      "header": "SQL Engine",
      "multiSelect": false,
      "options": [
        {
          "label": "Presto/Trino",
          "description": "Use Presto/Trino for all tables (recommended, default, fastest)"
        },
        {
          "label": "Hive",
          "description": "Use Hive for all tables (batch processing, large datasets)"
        }
      ]
    },
    {
      "question": "What is the source database containing these tables?",
      "header": "Source DB",
      "multiSelect": false,
      "options": [
        {
          "label": "client_src",
          "description": "Standard client source database"
        },
        {
          "label": "demo_db",
          "description": "Demo/sample database"
        }
      ]
    },
    {
      "question": "Staging database name? (Default: client_stg)",
      "header": "Staging DB",
      "multiSelect": false,
      "options": [
        {
          "label": "client_stg",
          "description": "Use default staging database (recommended)"
        }
      ]
    },
    {
      "question": "Config/Lookup database name? (Default: client_config)",
      "header": "Config DB",
      "multiSelect": false,
      "options": [
        {
          "label": "client_config",
          "description": "Use default config database (recommended)"
        }
      ]
    }
  ]
}

After collecting answers:

Q1: Parse answer to extract table_list. If "Other" contains comma-separated tables, use that.
Q2: Map to engine="presto" or engine="hive" for all tables
Q3: Extract source_database (use "Other" input if custom, else use selected option)
Q4: Extract staging_database (use "Other" input if custom, else default "client_stg")
Q5: Extract lookup_database (use "Other" input if custom, else default "client_config")

Then launch parallel sub-agents (one per table) with appropriate staging-transformer agent and all parameters.

What I'll Do

Step 1: Parse Table List

I will extract individual tables from your input:

Parse comma-separated list
Detect database prefix for each table
Identify total table count

Step 2: Detect Engine Strategy

I will determine processing strategy:

Single Engine: All tables use same engine
- Presto/Trino (default) → All tables to staging-transformer-presto
- Hive → All tables to staging-transformer-hive
Mixed Engines: Different engines per table
- Parse engine specification per table
- Route each table to appropriate sub-agent

Step 3: Launch Parallel Sub-Agents

I will create parallel sub-agent calls:

ONE sub-agent per table (maximum parallelism)
Single message with multiple Task calls (concurrent execution)
Each sub-agent processes independently (no blocking)
All sub-agents skip git workflow (consolidated at end)

Step 4: Monitor Parallel Execution

I will track all sub-agent progress:

Wait for all sub-agents to complete
Collect results from each transformation
Identify any failures or errors
Report partial success if needed

Step 5: Consolidate Results

After ALL tables complete successfully:

Aggregate file changes across all tables
Execute single git workflow:
- Create feature branch
- Commit all changes together
- Push to remote
- Create comprehensive PR
Report complete summary

Processing Strategy

Parallel Processing (Recommended for 2+ Tables)

User requests: "Transform tables A, B, C"

Main Claude creates 3 parallel sub-agent calls:

┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│  Sub-Agent 1    │  │  Sub-Agent 2    │  │  Sub-Agent 3    │
│  (Table A)      │  │  (Table B)      │  │  (Table C)      │
│  staging-       │  │  staging-       │  │  staging-       │
│  transformer-   │  │  transformer-   │  │  transformer-   │
│  presto         │  │  presto         │  │  presto         │
└─────────────────┘  └─────────────────┘  └─────────────────┘
        ↓                     ↓                     ↓
    [Files for A]        [Files for B]        [Files for C]
        ↓                     ↓                     ↓
        └─────────────────────┴─────────────────────┘
                              ↓
                    [Consolidated Git Workflow]
                    [Single PR with all tables]

Performance Benefits:

Speed: N tables in ~1x time instead of N×time
Efficiency: Optimal resource utilization
User Experience: Faster results for batch operations
Scalability: Can handle 10+ tables efficiently

Quality Assurance (Per Table)

Each sub-agent ensures complete compliance:

✅ Column Limit Management (max 200 columns) ✅ JSON Detection & Extraction (automatic) ✅ Date Processing (4 outputs per date column) ✅ Email/Phone Validation (with hashing) ✅ String Standardization (UPPER, TRIM, NULL handling) ✅ Deduplication Logic (if configured) ✅ Join Processing (if specified) ✅ Incremental Processing (state tracking) ✅ SQL File Creation (init, incremental, upsert) ✅ DIG File Management (conditional creation) ✅ Configuration Update (src_params.yml) ✅ Treasure Data Compatibility (VARCHAR/BIGINT timestamps)

Output Files

For Presto/Trino Engine (per table):

staging/init_queries/{source_db}_{table}_init.sql
staging/queries/{source_db}_{table}.sql
staging/queries/{source_db}_{table}_upsert.sql (if dedup)
Updated staging/config/src_params.yml (all tables)
staging/staging_transformation.dig (created once if not exists)

For Hive Engine (per table):

staging_hive/queries/{source_db}_{table}.sql
Updated staging_hive/config/src_params.yml (all tables)
staging_hive/staging_hive.dig (created once if not exists)
Template files (created once if not exist)

Plus:

Single git commit with all tables
Comprehensive pull request
Complete validation report for all tables

Example Usage

Example 1: Same Engine (Presto Default)

User: Transform tables: client_src.customers_histunion, client_src.orders_histunion, client_src.products_histunion

→ Parallel execution with 3 staging-transformer-presto agents
→ All files to staging/ directory
→ Single consolidated git workflow
→ Time: ~1x (vs 3x sequential)

Example 2: Same Engine (Hive Explicit)

User: Transform tables using Hive: client_src.events_histunion, client_src.profiles_histunion

→ Parallel execution with 2 staging-transformer-hive agents
→ All files to staging_hive/ directory
→ Single consolidated git workflow
→ Time: ~1x (vs 2x sequential)

Example 3: Mixed Engines

User: Transform table1 using Hive, table2 using Presto, table3 using Hive

→ Parallel execution:
  - Table1 → staging-transformer-hive
  - Table2 → staging-transformer-presto
  - Table3 → staging-transformer-hive
→ Files distributed to appropriate directories
→ Single consolidated git workflow
→ Time: ~1x (vs 3x sequential)

Error Handling

Partial Success Scenario

If some tables succeed and others fail:

Report Clear Status:

✅ Successfully transformed: table1, table2
❌ Failed: table3 (error message)

Preserve Successful Work:
- Keep files from successful transformations
- Allow retry of only failed tables
Git Safety:
- Only execute git workflow if ALL tables succeed
- Otherwise, keep changes local for review

Full Failure Scenario

If all tables fail:

Report detailed error for each table
No git workflow execution
Provide troubleshooting guidance

Next Steps After Batch Transformation

Review Pull Request:

Title: "Batch transform 5 tables to staging"

Body:
- Transformed tables: table1, table2, table3, table4, table5
- Engine: Presto/Trino
- All validation gates passed ✅
- Files created: 15 SQL files, 1 config update

Verify Generated Files:

# For Presto
ls -l staging/queries/
ls -l staging/init_queries/
cat staging/config/src_params.yml

# For Hive
ls -l staging_hive/queries/
cat staging_hive/config/src_params.yml

Test Workflow:

cd staging  # or staging_hive
td wf push
td wf run staging_transformation.dig  # or staging_hive.dig

Monitor All Tables:

SELECT table_name, inc_value, project_name
FROM client_config.inc_log
WHERE table_name IN ('table1', 'table2', 'table3')
ORDER BY inc_value DESC

Performance Comparison

Tables	Sequential Time	Parallel Time	Speedup
2	~10 min	~5 min	2x
3	~15 min	~5 min	3x
5	~25 min	~5 min	5x
10	~50 min	~5 min	10x

Note: Actual times vary based on table complexity and data volume.

Production-Ready Guarantee

All batch transformations will:

✅ Execute in parallel for maximum speed
✅ Maintain complete quality for each table
✅ Provide atomic git workflow (all or nothing)
✅ Include comprehensive error handling
✅ Generate maintainable code
✅ Match production standards exactly

Ready to proceed? Please provide your table list and I'll launch parallel sub-agents for maximum efficiency!

Format Examples:

Transform tables: table1, table2, table3 (same database)
Transform client_src.table1, client_src.table2 (explicit database)
Transform table1 using Hive, table2 using Presto (mixed engines)

/transform-batch