Validate YAML configuration for hybrid ID unification before SQL generation
Validates hybrid ID unification YAML configuration for syntax, structure, and cross-references before SQL generation.
/plugin marketplace add treasure-data/aps_claude_tools/plugin install treasure-data-cdp-hybrid-idu-plugins-cdp-hybrid-idu@treasure-data/aps_claude_toolsValidate your unify.yml configuration file to ensure it's properly structured and ready for SQL generation. This command checks syntax, structure, validation rules, and provides recommendations for optimization.
unify.ymlCheck presence and structure of:
Validate individual sections:
Keys Section:
valid_regexp is a valid regex pattern (if provided)invalid_texts is an array (if provided)Tables Section:
Canonical IDs Section:
merge_by_keys references existing keysmerge_iterations is a positive integer (if provided)Master Tables Section (if present):
Provide recommendations for:
Generate comprehensive report with:
/cdp-hybrid-idu:hybrid-unif-config-validate
I'll prompt you for:
- YAML file path
YAML file: /path/to/unify.yml
name: customer_unification
keys:
- name: email
valid_regexp: ".*@.*"
invalid_texts: ['', 'N/A', 'null']
- name: customer_id
invalid_texts: ['', 'N/A']
tables:
- table: customer_profiles
key_columns:
- {column: email_std, key: email}
- {column: customer_id, key: customer_id}
- table: orders
key_columns:
- {column: email_address, key: email}
canonical_ids:
- name: unified_id
merge_by_keys: [email, customer_id]
merge_iterations: 15
master_tables:
- name: customer_master
canonical_id: unified_id
attributes:
- name: best_email
source_columns:
- {table: customer_profiles, column: email_std, priority: 1}
- {table: orders, column: email_address, priority: 2}
✅ YAML VALIDATION SUCCESSFUL
File Structure:
✅ Valid YAML syntax
✅ All required sections present
✅ Proper indentation and formatting
Keys Section (2 keys):
✅ email: Valid regex pattern, invalid_texts defined
✅ customer_id: Invalid_texts defined
⚠ Consider adding valid_regexp for customer_id for better validation
Tables Section (2 tables):
✅ customer_profiles: 2 key columns mapped
✅ orders: 1 key column mapped
✅ All referenced keys exist
Canonical IDs Section:
✅ Name: unified_id
✅ Merge keys: email, customer_id (both exist)
✅ Iterations: 15 (recommended range: 10-20)
Master Tables Section (1 master table):
✅ customer_master: References unified_id
✅ Attribute 'best_email': 2 sources with priorities
✅ All source tables exist
Cross-References:
✅ All merge_by_keys defined in keys section
✅ All key_columns reference existing keys
✅ All master table sources exist
✅ No canonical ID name conflicts
Recommendations:
💡 Consider adding valid_regexp for customer_id (e.g., "^[A-Z0-9]+$")
💡 Add more master table attributes for richer customer profiles
💡 Consider array attributes (top_3_emails) for historical tracking
Summary:
✅ 0 errors
⚠ 1 warning
💡 3 recommendations
✓ Configuration is ready for SQL generation!
name field presentkeys section present with at least one keytables section present with at least one tablecanonical_ids section presentError: Invalid YAML: mapping values are not allowed here
Solution: Check indentation (use spaces, not tabs), ensure colons have space after them
Error: Invalid YAML: could not find expected ':'
Solution: Check for missing colons in key-value pairs
Error: Missing required section: keys
Solution: Add keys section with at least one key definition
Error: Empty tables section
Solution: Add at least one table with key_columns
Error: Key 'phone' referenced in table 'orders' but not defined in keys section
Solution: Add phone key to keys section or remove reference
Error: Merge key 'phone_number' not found in keys section
Solution: Add phone_number to keys section or remove from merge_by_keys
Error: Master table source 'customer_360' not found in tables section
Solution: Add customer_360 to tables section or use correct table name
Error: merge_iterations must be a positive integer, got: 'auto'
Solution: Either remove merge_iterations (auto-calculate) or specify integer (e.g., 15)
Error: Priority must be a positive integer, got: 'high'
Solution: Use numeric priority (1 for highest, 2 for second, etc.)
✅ Configuration validated successfully!
Ready for:
• SQL generation (Databricks or Snowflake)
• Direct execution after generation
Next steps:
1. /cdp-hybrid-idu:hybrid-generate-databricks
2. /cdp-hybrid-idu:hybrid-generate-snowflake
3. /cdp-hybrid-idu:hybrid-setup (complete workflow)
❌ Configuration has errors that must be fixed
Errors (must fix):
1. Missing required section: canonical_ids
2. Undefined key 'phone' referenced in table 'orders'
Suggestions:
• Add canonical_ids section with name and merge_by_keys
• Add phone key to keys section or remove from orders
Would you like help fixing these issues? (y/n)
I can help you:
Validation passes when:
Ready to validate your YAML configuration?
Provide your unify.yml file path to begin validation!