Validate hist-union workflow and SQL files against production quality gates
Validates hist-union workflows and SQL files against production quality gates.
/plugin marketplace add treasure-data/aps_claude_tools/plugin install treasure-data-cdp-histunion-plugins-cdp-histunion@treasure-data/aps_claude_toolsI'll help you validate your hist-union workflow files to ensure they meet production standards.
First, let me gather information about what you'd like to validate:
<question_tool> Use the AskUserQuestion tool to gather:
Questions to ask:
Question 1: "What would you like to validate?"
Question 2: "What validation mode do you want?"
After gathering this information:
File: hist_union/hist_union_runner.dig
Checks:
_parallel: true configured for concurrent executionFiles: hist_union/queries/*.sql
For each SQL file, checks:
Requires MCP access to Treasure Data
For each table pair, checks:
Checks against template requirements:
No MCP required - Validates file structure and SQL syntax only
Use when: Quick syntax check without database access
Checks: File structure, YAML syntax, SQL syntax, basic patterns
Duration: ~10 seconds
Requires MCP - Validates against actual table schemas
Use when: Pre-deployment validation, full compliance check
Checks: Everything in Mode 1 + schema matching, column validation
Duration: ~30-60 seconds (depends on table count)
Scanning hist_union directory...
✅ Found workflow file: hist_union_runner.dig
✅ Found N SQL files in queries/
Validating hist_union_runner.dig...
✅ YAML syntax valid
✅ timezone set to UTC
✅ _export section present with td.database and lkup_db
✅ +create_inc_log_table task present
✅ +hist_union_tasks section present
✅ _parallel: true configured
✅ No schedule block found
✅ All referenced SQL files exist
Validating hist_union/queries/klaviyo_events.sql...
✅ SQL syntax valid (Presto/Trino compatible)
✅ CREATE TABLE statement found
✅ Table name: client_src.klaviyo_events_histunion
✅ INSERT INTO with UNION ALL structure found
✅ Watermark filtering present for hist table
✅ Watermark filtering present for inc table
✅ Watermark update for hist table found
✅ Watermark update for inc table found
✅ project_name = 'hist_union' verified
✅ No backticks found (using double quotes)
Validating schemas via MCP tool...
Table: klaviyo_events
✅ Inc table exists: client_src.klaviyo_events
✅ Hist table exists: client_src.klaviyo_events_hist
✅ Retrieved inc schema: 45 columns
✅ Retrieved hist schema: 44 columns
✅ Schema difference: inc has 'incremental_date', hist does not
✅ CREATE TABLE matches inc schema (45 columns)
✅ Column order matches inc schema
✅ NULL handling present for 'incremental_date' in hist SELECT
✅ All 45 inc columns present in SQL
✅ UNION ALL column counts match (45 = 45)
Checking template compliance...
Table: klaviyo_lists
⚠️ Full load table detected
✅ Uses Case 3 template (DROP TABLE + no WHERE clause)
✅ Watermarks still updated
Table: klaviyo_events
✅ Incremental table
✅ Uses Case 2 template (inc has extra columns)
✅ CREATE TABLE IF NOT EXISTS used
✅ WHERE clauses present for watermark filtering
✅ COALESCE with default value 0
Generating validation report...
✅ Report created with all findings
✅ Errors highlighted (if any)
✅ Warnings noted (if any)
✅ Recommendations provided (if any)
═══════════════════════════════════════════════════════════
HIST-UNION VALIDATION REPORT
═══════════════════════════════════════════════════════════
Validation Mode: [Syntax Only / Full Schema Validation]
Timestamp: 2024-10-13 14:30:00 UTC
Workflow File: hist_union/hist_union_runner.dig
SQL Files: 5
Overall Status: ✅ PASSED / ❌ FAILED / ⚠️ WARNINGS
───────────────────────────────────────────────────────────
WORKFLOW FILE: hist_union_runner.dig
───────────────────────────────────────────────────────────
✅ YAML Syntax: Valid
✅ Structure: Complete (all required sections present)
✅ Parallel Execution: Configured (_parallel: true)
✅ inc_log Task: Present
✅ Schedule: None (correct)
✅ SQL References: All 5 files exist
───────────────────────────────────────────────────────────
SQL FILE: queries/klaviyo_events.sql
───────────────────────────────────────────────────────────
✅ SQL Syntax: Valid (Presto/Trino)
✅ Template: Case 2 (Inc has extra columns)
✅ Table: client_src.klaviyo_events_histunion
✅ CREATE TABLE: Present
✅ UNION ALL: Correct structure
✅ Watermarks: Both hist and inc updates present
✅ NULL Handling: Correct for 'incremental_date'
✅ Schema Match: All 45 columns present in correct order
───────────────────────────────────────────────────────────
SQL FILE: queries/klaviyo_lists.sql
───────────────────────────────────────────────────────────
✅ SQL Syntax: Valid (Presto/Trino)
✅ Template: Case 3 (Full load)
⚠️ Table Type: FULL LOAD table
✅ DROP TABLE: Present
✅ CREATE TABLE: Correct (no IF NOT EXISTS)
✅ WHERE Clauses: Absent (correct for full load)
✅ UNION ALL: Correct structure
✅ Watermarks: Both hist and inc updates present
✅ Schema Match: All 52 columns present in correct order
... (for all SQL files)
───────────────────────────────────────────────────────────
ISSUES FOUND
───────────────────────────────────────────────────────────
❌ ERROR: queries/shopify_products.sql
- Line 15: Column 'incremental_date' missing in CREATE TABLE
- Expected: 'incremental_date varchar' based on inc table schema
- Fix: Add 'incremental_date varchar' to CREATE TABLE statement
❌ ERROR: queries/users.sql
- Line 45: Using backticks around column "index"
- Fix: Replace `index` with "index" (Presto/Trino requires double quotes)
⚠️ WARNING: hist_union_runner.dig
- Line 25: Task +shopify_variants_histunion references non-existent SQL file
- Expected: queries/shopify_variants.sql
- Fix: Create missing SQL file or remove task
⚠️ WARNING: queries/onetrust_profiles.sql
- Missing watermark update for hist table
- Should have: INSERT INTO inc_log for onetrust_profiles_hist
- Fix: Add watermark update after UNION ALL insert
───────────────────────────────────────────────────────────
RECOMMENDATIONS
───────────────────────────────────────────────────────────
💡 Consider adding these improvements:
1. Add comments to SQL files explaining schema differences
2. Document which tables are full load vs incremental
3. Add error handling tasks in workflow
4. Consider adding validation queries after inserts
💡 Performance optimizations:
1. Review parallel task limit based on TD account
2. Consider partitioning very large tables
3. Review watermark index on inc_log table
User: "Validate my hist-union files"
I will:
1. Scan hist_union directory
2. Validate workflow YAML syntax
3. Validate all SQL file syntax
4. Check file references
5. Generate report with findings
User: "Validate hist-union files with full schema check"
I will:
1. Scan hist_union directory
2. Validate workflow and SQL syntax
3. Use MCP tool to get all table schemas
4. Compare CREATE TABLE with actual schemas
5. Verify column order and NULL handling
6. Check template compliance
7. Generate comprehensive report
User: "Validate just the klaviyo_events.sql file"
I will:
1. Read queries/klaviyo_events.sql
2. Validate SQL syntax
3. Check template structure
4. Optionally get schema via MCP
5. Generate focused report for this file
✅ All checks passed!
Next steps:
1. Deploy to Treasure Data: td wf push hist_union
2. Run workflow: td wf run hist_union_runner.dig
3. Monitor execution: td wf logs hist_union_runner.dig
4. Verify results in target tables
❌ Validation found N errors and M warnings
Next steps:
1. Review validation report for details
2. Fix all critical errors (❌)
3. Address warnings (⚠️ ) if possible
4. Re-run validation
5. Deploy only after all errors are resolved
Before deploying, ensure:
Ready to validate? Specify validation mode (syntax-only or full-schema) and I'll run comprehensive validation against all production quality gates.