Generate professional HTML/PDF merge statistics report from ID unification results for Snowflake or Databricks with expert analysis and visualizations
Generates comprehensive HTML/PDF reports analyzing ID unification merge statistics with expert insights and visualizations.
/plugin marketplace add treasure-data/aps_claude_tools/plugin install treasure-data-cdp-hybrid-idu-plugins-cdp-hybrid-idu@treasure-data/aps_claude_toolsI'll generate a comprehensive, professional HTML report analyzing your ID unification merge statistics with:
Platform Support:
Output Format:
For Snowflake:
INDRESH_TEST, CUSTOMER_CDP)PUBLIC, ID_UNIFICATION)For Databricks:
customer_data, cdp_prod)id_unification, unified_profiles)td_id, unified_customer_id, master_id)
{canonical_id}_lookup, {canonical_id}_master_table, etc.id_unification_report.html)Snowflake:
1. Verify Snowflake MCP tools are available
2. Test connection to specified database.schema
3. Validate canonical ID tables exist:
- {database}.{schema}.{canonical_id}_lookup
- {database}.{schema}.{canonical_id}_master_table
- {database}.{schema}.{canonical_id}_source_key_stats
- {database}.{schema}.{canonical_id}_result_key_stats
4. Confirm access permissions
Databricks:
1. Verify Databricks MCP tools are available (or use Snowflake fallback)
2. Test connection to specified catalog.schema
3. Validate canonical ID tables exist
4. Confirm access permissions
I'll execute 16 specialized queries to collect comprehensive statistics:
Core Statistics Queries:
Source Key Statistics
Result Key Statistics
Canonical ID Metrics
Top Merged Profiles
Merge Distribution Analysis
Key Type Distribution
Master Table Quality Metrics
Configuration Metadata
Platform-Specific SQL Adaptation:
For Snowflake:
SELECT COUNT(*) as total_identities,
COUNT(DISTINCT canonical_id) as unique_canonical_ids
FROM {database}.{schema}.{canonical_id}_lookup;
For Databricks:
SELECT COUNT(*) as total_identities,
COUNT(DISTINCT canonical_id) as unique_canonical_ids
FROM {catalog}.{schema}.{canonical_id}_lookup;
I'll perform expert-level calculations:
Deduplication Rates:
For each key type:
- Source distinct count (pre-unification)
- Final canonical IDs (post-unification)
- Deduplication % = (source - final) / source * 100
Merge Ratios:
- Average identities per customer = total_identities / unique_canonical_ids
- Distribution across categories
- Outlier detection (10+ merges)
Convergence Analysis:
- Parse from execution logs if available
- Calculate from iteration metadata tables
- Estimate convergence quality
Data Quality Scores:
- Coverage % for each attribute
- Completeness assessment
- Quality grading (Excellent, Good, Needs Improvement)
I'll generate a pixel-perfect HTML report with:
Design Features:
Report Structure:
<!DOCTYPE html>
<html>
<head>
- Professional CSS styling
- Chart/visualization styles
- Print media queries
</head>
<body>
<header>
- Report title
- Executive tagline
</header>
<metadata-bar>
- Database/Catalog info
- Canonical ID name
- Generation timestamp
- Platform indicator
</metadata-bar>
<section: Executive Summary>
- 4 KPI metric cards
- Key findings insight box
</section>
<section: Identity Resolution Performance>
- Source vs result comparison table
- Deduplication rate analysis
- Horizontal bar charts
- Expert insights
</section>
<section: Merge Distribution Analysis>
- Category breakdown table
- Distribution visualizations
- Pattern analysis insights
</section>
<section: Top Merged Profiles>
- Top 10 ranked table
- Complexity badges
- Investigation recommendations
</section>
<section: Source Table Configuration>
- Column mapping table
- Source contributions
- Multi-key strategy analysis
</section>
<section: Master Table Data Quality>
- 6 coverage cards with progress bars
- Sample records table
- Quality assessment
</section>
<section: Convergence Performance>
- Iteration breakdown table
- Convergence progression chart
- Efficiency analysis
</section>
<section: Expert Recommendations>
- 4 recommendation cards
- Strategic next steps
- Downstream activation ideas
</section>
<section: Summary Statistics>
- Comprehensive metrics table
- All key numbers documented
</section>
<footer>
- Generation metadata
- Platform information
- Report description
</footer>
</body>
</html>
Pre-Output Validation:
1. Verify all sections have data
2. Check calculations are correct
3. Validate percentages sum properly
4. Ensure no missing values
5. Confirm HTML is well-formed
File Output:
1. Write HTML to specified path
2. Create backup if file exists
3. Set proper file permissions
4. Verify file was written successfully
Report Summary:
ā Report generated: {file_path}
ā File size: {size} KB
ā Sections included: 9
ā Statistics queries: 16
ā Data quality score: {score}%
ā Ready for: Browser viewing, PDF export, sharing
User Input:
Platform: Snowflake
Database: INDRESH_TEST
Schema: PUBLIC
Canonical ID: td_id
Output: snowflake_merge_report.html
Process:
ā Connected to Snowflake via MCP
ā Database: INDRESH_TEST.PUBLIC validated
ā Tables found:
- td_id_lookup (19,512 records)
- td_id_master_table (4,940 records)
- td_id_source_key_stats (4 records)
- td_id_result_key_stats (4 records)
Executing queries:
ā Query 1: Source statistics retrieved
ā Query 2: Result statistics retrieved
ā Query 3: Canonical ID counts (19,512 ā 4,940)
ā Query 4: Top 10 merged profiles identified
ā Query 5: Merge distribution calculated
ā Query 6: Key type distribution analyzed
ā Query 7: Master table coverage (100% email, 99.39% phone)
ā Query 8: Sample records extracted
ā Query 9-11: Metadata retrieved
Calculating metrics:
ā Merge ratio: 3.95:1
ā Fragmentation reduction: 74.7%
ā Deduplication rates:
- customer_id: 23.9%
- email: 32.0%
- phone: 14.8%
ā Data quality score: 99.7%
Generating HTML report:
ā Executive summary section
ā Performance analysis section
ā Merge distribution section
ā Top profiles section
ā Source configuration section
ā Data quality section
ā Convergence section
ā Recommendations section
ā Summary statistics section
ā Report saved: snowflake_merge_report.html (142 KB)
ā Open in browser to view
ā Print to PDF for distribution
Generated Report Contents:
Executive Summary:
- 4,940 unified profiles
- 19,512 total identities
- 3.95:1 merge ratio
- 74.7% fragmentation reduction
Identity Resolution:
- customer_id: 6,489 ā 4,940 (23.9% reduction)
- email: 7,261 ā 4,940 (32.0% reduction)
- phone: 5,762 ā 4,910 (14.8% reduction)
Merge Distribution:
- 89.0% profiles: 3-5 identities (normal)
- 8.1% profiles: 6-10 identities (high engagement)
- 2.3% profiles: 10+ identities (complex)
Top Merged Profile:
- mS9ssBEh4EsN: 38 identities merged
Data Quality:
- Email: 100% coverage
- Phone: 99.39% coverage
- Names: 100% coverage
- Location: 100% coverage
Expert Recommendations:
- Implement incremental processing
- Monitor profiles with 20+ merges
- Enable downstream activation
- Set up quality monitoring
User Input:
Platform: Databricks
Catalog: customer_cdp
Schema: id_unification
Canonical ID: unified_customer_id
Output: databricks_merge_report.html
Process:
ā Connected to Databricks (or using Snowflake MCP fallback)
ā Catalog: customer_cdp.id_unification validated
ā Tables found:
- unified_customer_id_lookup
- unified_customer_id_master_table
- unified_customer_id_source_key_stats
- unified_customer_id_result_key_stats
[Same query execution and report generation as Snowflake]
ā Report saved: databricks_merge_report.html
mcp__snowflake__execute_query tool9 Report Sections:
16 Statistical Queries:
The command automatically finds tables based on your canonical ID name:
For canonical ID = {canonical_id}:
Lookup Table: {canonical_id}_lookup
Master Table: {canonical_id}_master_table
Source Stats: {canonical_id}_source_key_stats
Result Stats: {canonical_id}_result_key_stats
Unification Metadata: unification_metadata
Column Lookup: column_lookup
Filter Lookup: filter_lookup
All tables must be in the same database.schema (Snowflake) or catalog.schema (Databricks)
Styling:
Sections:
Visualizations:
Interactivity:
To create a PDF from the HTML report:
Issue: "Tables not found"
Solution:
1. Verify canonical ID name is correct
2. Check database/catalog and schema names
3. Ensure unification workflow completed successfully
4. Confirm table naming: {canonical_id}_lookup, {canonical_id}_master_table, etc.
Issue: "MCP tools not available"
Solution:
1. For Snowflake: Verify Snowflake MCP server is configured
2. For Databricks: Fall back to Snowflake MCP with proper connection string
3. Check network connectivity
4. Validate credentials
Issue: "No data in statistics tables"
Solution:
1. Verify unification workflow ran completely
2. Check that statistics SQL files were executed
3. Confirm data exists in lookup and master tables
4. Re-run the unification workflow if needed
Issue: "Permission denied"
Solution:
1. Verify READ access to all tables
2. For Snowflake: Grant SELECT on schema
3. For Databricks: Grant USE CATALOG, USE SCHEMA, SELECT
4. Check role/user permissions
Generated report will:
/cdp-hybrid-idu:hybrid-unif-merge-stats-creator
> Platform: Snowflake
> Database: PROD_CDP
> Schema: ID_UNIFICATION
> Canonical ID: master_customer_id
> Output: (press Enter for default)
ā Report generated: id_unification_report.html
/cdp-hybrid-idu:hybrid-unif-merge-stats-creator
> Platform: Databricks
> Catalog: analytics_prod
> Schema: unified_ids
> Canonical ID: td_id
> Output: /reports/weekly/td_id_stats_2025-10-15.html
ā Report generated: /reports/weekly/td_id_stats_2025-10-15.html
Generate reports for different environments:
# Production
/hybrid-unif-merge-stats-creator
Platform: Snowflake
Database: PROD_CDP
Output: prod_merge_stats.html
# Staging
/hybrid-unif-merge-stats-creator
Platform: Snowflake
Database: STAGING_CDP
Output: staging_merge_stats.html
# Compare metrics across environments
Generate reports at different stages:
reports/
2025-10-15_td_id_merge_stats.html
2025-10-08_td_id_merge_stats.html
2025-10-01_td_id_merge_stats.html
Track improvements over time by comparing:
Ready to generate your merge statistics report?
Please provide:
Example:
I want to generate a merge statistics report for:
Platform: Snowflake
Database: INDRESH_TEST
Schema: PUBLIC
Canonical ID: td_id
Output: my_unification_report.html
I'll analyze your ID unification results and create a comprehensive, beautiful HTML report with expert insights!