Transform multiple database tables in parallel with maximum efficiency
Transforms multiple database tables to staging format using parallel sub-agents for maximum speed.
/plugin marketplace add treasure-data/aps_claude_tools/plugin install treasure-data-cdp-staging-plugins-cdp-staging@treasure-data/aps_claude_toolsI'll help you transform multiple database tables to staging format using parallel sub-agent execution for maximum performance.
FIRST, use the AskUserQuestion tool to interactively collect all required parameters.
Call AskUserQuestion with these questions:
{
"questions": [
{
"question": "Which tables do you want to transform? (Comma-separated, e.g., table1, table2 OR db.table1, db.table2)",
"header": "Tables",
"multiSelect": false,
"options": [
{
"label": "Table list",
"description": "I'll provide comma-separated table names"
}
]
},
{
"question": "Which SQL engine strategy should be used for these tables?",
"header": "SQL Engine",
"multiSelect": false,
"options": [
{
"label": "Presto/Trino",
"description": "Use Presto/Trino for all tables (recommended, default, fastest)"
},
{
"label": "Hive",
"description": "Use Hive for all tables (batch processing, large datasets)"
}
]
},
{
"question": "What is the source database containing these tables?",
"header": "Source DB",
"multiSelect": false,
"options": [
{
"label": "client_src",
"description": "Standard client source database"
},
{
"label": "demo_db",
"description": "Demo/sample database"
}
]
},
{
"question": "Staging database name? (Default: client_stg)",
"header": "Staging DB",
"multiSelect": false,
"options": [
{
"label": "client_stg",
"description": "Use default staging database (recommended)"
}
]
},
{
"question": "Config/Lookup database name? (Default: client_config)",
"header": "Config DB",
"multiSelect": false,
"options": [
{
"label": "client_config",
"description": "Use default config database (recommended)"
}
]
}
]
}
After collecting answers:
Then launch parallel sub-agents (one per table) with appropriate staging-transformer agent and all parameters.
I will extract individual tables from your input:
I will determine processing strategy:
staging-transformer-prestostaging-transformer-hiveI will create parallel sub-agent calls:
I will track all sub-agent progress:
After ALL tables complete successfully:
User requests: "Transform tables A, B, C"
Main Claude creates 3 parallel sub-agent calls:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Sub-Agent 1 │ │ Sub-Agent 2 │ │ Sub-Agent 3 │
│ (Table A) │ │ (Table B) │ │ (Table C) │
│ staging- │ │ staging- │ │ staging- │
│ transformer- │ │ transformer- │ │ transformer- │
│ presto │ │ presto │ │ presto │
└─────────────────┘ └─────────────────┘ └─────────────────┘
↓ ↓ ↓
[Files for A] [Files for B] [Files for C]
↓ ↓ ↓
└─────────────────────┴─────────────────────┘
↓
[Consolidated Git Workflow]
[Single PR with all tables]
Each sub-agent ensures complete compliance:
✅ Column Limit Management (max 200 columns) ✅ JSON Detection & Extraction (automatic) ✅ Date Processing (4 outputs per date column) ✅ Email/Phone Validation (with hashing) ✅ String Standardization (UPPER, TRIM, NULL handling) ✅ Deduplication Logic (if configured) ✅ Join Processing (if specified) ✅ Incremental Processing (state tracking) ✅ SQL File Creation (init, incremental, upsert) ✅ DIG File Management (conditional creation) ✅ Configuration Update (src_params.yml) ✅ Treasure Data Compatibility (VARCHAR/BIGINT timestamps)
staging/init_queries/{source_db}_{table}_init.sqlstaging/queries/{source_db}_{table}.sqlstaging/queries/{source_db}_{table}_upsert.sql (if dedup)staging/config/src_params.yml (all tables)staging/staging_transformation.dig (created once if not exists)staging_hive/queries/{source_db}_{table}.sqlstaging_hive/config/src_params.yml (all tables)staging_hive/staging_hive.dig (created once if not exists)User: Transform tables: client_src.customers_histunion, client_src.orders_histunion, client_src.products_histunion
→ Parallel execution with 3 staging-transformer-presto agents
→ All files to staging/ directory
→ Single consolidated git workflow
→ Time: ~1x (vs 3x sequential)
User: Transform tables using Hive: client_src.events_histunion, client_src.profiles_histunion
→ Parallel execution with 2 staging-transformer-hive agents
→ All files to staging_hive/ directory
→ Single consolidated git workflow
→ Time: ~1x (vs 2x sequential)
User: Transform table1 using Hive, table2 using Presto, table3 using Hive
→ Parallel execution:
- Table1 → staging-transformer-hive
- Table2 → staging-transformer-presto
- Table3 → staging-transformer-hive
→ Files distributed to appropriate directories
→ Single consolidated git workflow
→ Time: ~1x (vs 3x sequential)
If some tables succeed and others fail:
Report Clear Status:
✅ Successfully transformed: table1, table2
❌ Failed: table3 (error message)
Preserve Successful Work:
Git Safety:
If all tables fail:
Review Pull Request:
Title: "Batch transform 5 tables to staging"
Body:
- Transformed tables: table1, table2, table3, table4, table5
- Engine: Presto/Trino
- All validation gates passed ✅
- Files created: 15 SQL files, 1 config update
Verify Generated Files:
# For Presto
ls -l staging/queries/
ls -l staging/init_queries/
cat staging/config/src_params.yml
# For Hive
ls -l staging_hive/queries/
cat staging_hive/config/src_params.yml
Test Workflow:
cd staging # or staging_hive
td wf push
td wf run staging_transformation.dig # or staging_hive.dig
Monitor All Tables:
SELECT table_name, inc_value, project_name
FROM client_config.inc_log
WHERE table_name IN ('table1', 'table2', 'table3')
ORDER BY inc_value DESC
| Tables | Sequential Time | Parallel Time | Speedup |
|---|---|---|---|
| 2 | ~10 min | ~5 min | 2x |
| 3 | ~15 min | ~5 min | 3x |
| 5 | ~25 min | ~5 min | 5x |
| 10 | ~50 min | ~5 min | 10x |
Note: Actual times vary based on table complexity and data volume.
All batch transformations will:
Ready to proceed? Please provide your table list and I'll launch parallel sub-agents for maximum efficiency!
Format Examples:
Transform tables: table1, table2, table3 (same database)Transform client_src.table1, client_src.table2 (explicit database)Transform table1 using Hive, table2 using Presto (mixed engines)