Search everything...

Skill

Catalog Steward

Governs SAP Datasphere catalogs by enriching metadata, managing glossaries and tags, defining KPIs, and analyzing lineage impacts. Improves discoverability and assesses changes.

data-engineering

database

npx claudepluginhub mariodefelipe/sap-datasphere-plugin-for-claude-cowork

Tool Access

This skill uses the workspace's default tool permissions.

Preview

The Catalog Steward skill empowers you to take control of your SAP Datasphere's internal data governance. This skill focuses on **enriching metadata, managing business glossaries, defining KPIs, controlling tag taxonomies, and performing lineage-based impact analysis**—all essential for enabling self-service analytics and preventing governance chaos.

Supporting Assets

references/catalog-governance-guide.md

SKILL.md

Similar Skills

using-git-worktrees

169.2k

Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.

superpowers

subagent-driven-development

169.2k

Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.

3 files

superpowers

dispatching-parallel-agents

169.2k

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

Stats

Stars12

Forks3

Last CommitFeb 9, 2026

Actions

View Source View Plugin View on GitHub View README

Catalog Steward

From datasphere

Governs SAP Datasphere catalogs by enriching metadata, managing glossaries and tags, defining KPIs, and analyzing lineage impacts. Improves discoverability and assesses changes.

data-engineering

database

npx claudepluginhub mariodefelipe/sap-datasphere-plugin-for-claude-cowork

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Supporting Assets

references/catalog-governance-guide.md

SKILL.md

Catalog Steward Skill

Overview

The Catalog Steward skill empowers you to take control of your SAP Datasphere's internal data governance. This skill focuses on enriching metadata, managing business glossaries, defining KPIs, controlling tag taxonomies, and performing lineage-based impact analysis—all essential for enabling self-service analytics and preventing governance chaos.

Unlike the Data Product Publisher skill (which publishes external marketplace products), the Catalog Steward skill is about making your internal Datasphere repository discoverable, understandable, and trustworthy. When users search your catalog, they should find well-named assets with clear descriptions, consistent business terminology, quality metrics, and transparent lineage.

Why Catalog Governance Matters

Self-Service Analytics: Business users can find and trust data without submitting tickets
Compliance & Auditability: Clear lineage and ownership trails support regulatory requirements
Impact Analysis: Understand change ripple effects before modifying critical assets
Terminology Alignment: Glossaries ensure "Revenue" means the same thing across teams
Data Quality Transparency: Quality scores help users select the right datasets
Governance at Scale: Consistent metadata reduces technical debt and tribal knowledge

Core Workflows

1. Metadata Enrichment

Metadata enrichment transforms technical asset names and sparse descriptions into discoverable, business-friendly documentation.

Workflow: Analyze and Suggest Business-Friendly Names

When to use: During onboarding, after importing source system tables, or during catalog cleanup sprints.

Steps:

Search for undernamed assets:
- Use search_catalog to find tables/views with missing or cryptic names (e.g., "T_SALES_001")
- Filter by asset type (Dimension, Fact, View, Model)
- Identify candidates for enrichment
Analyze content with column inspection:
- Use get_asset_details to inspect table/view structure
- Review key columns to infer business meaning
- Identify primary dimensions and measures
- Example: "T_SALES_001" contains CUST_ID, ORDER_DT, AMOUNT → suggests "Customer Orders Fact"
Suggest and apply business names:
- Map technical names to business-friendly alternatives
- Follow naming conventions (see references for templates)
- Apply updated names via catalog metadata endpoints
- Document rationale in internal notes

Best Practices:

Include plural nouns for fact tables, singular for dimensions
Use business domain terminology (not IT jargon)
Avoid ambiguity: "Sales" → "Monthly Sales Orders" or "Daily Sales Revenue"
Create a naming convention document and version it

Workflow: Write Meaningful Descriptions

When to use: When onboarding new users, before publishing catalog assets, or during quality audits.

Steps:

Gather context:
- Use get_asset_details to extract technical metadata
- Review related objects (upstream sources, downstream consumers)
- Identify responsible team or owner
Write descriptions following a template:
- What: One-sentence summary of what the asset contains
- Why: Business purpose or use case
- Key columns: 2-3 most important dimensions/measures
- Refresh frequency: How often is it updated
- Caveats: Data quality issues, exclusions, or limitations
- Example template (see references)
Link to upstream sources:
- Document source systems or parent tables
- Use get_object_definition to trace lineage
- Include transformation logic (if relevant)
Review and version:
- Have data owner approve descriptions
- Track description changes in catalog versioning

Best Practices:

Keep descriptions under 500 words; link to detailed documentation elsewhere
Use plain language; assume audience is business analyst (not DBA)
Include examples of typical queries or use cases
Flag experimental or deprecated assets clearly
Update descriptions when business meaning changes (not just when data structure changes)

Workflow: Auto-Suggest Tags Based on Content Analysis

When to use: During bulk catalog onboarding or when implementing a new tag taxonomy.

Steps:

Analyze asset content:
- Use get_asset_details to inspect column names, types, and distributions
- Use analyze_column_distribution to understand data characteristics
- Identify data types (financial, HR, product, customer, operational)
- Detect common patterns (dates, IDs, amounts)
Match against tag taxonomy:
- Map identified characteristics to your tag taxonomy (see references)
- Example: columns contain "SALARY", "EMPLOYEE_ID" → suggest tags: hr, sensitive, employee-master
Propose tags with confidence scoring:
- High confidence: tags match multiple column patterns
- Medium confidence: tags match domain or naming conventions
- Low confidence: tags are contextual or require human review
Review and apply:
- Present suggestions with reasoning
- Allow manual override for edge cases
- Batch-apply approved tags

Best Practices:

Use a controlled vocabulary (see tag taxonomy in references)
Combine multiple tag types (domain, sensitivity, cadence, owner)
Review auto-suggestions; don't apply blindly
Document why assets receive specific tags
Update tags when asset usage patterns change

Workflow: Bulk Metadata Updates Across Multiple Assets

When to use: After organizational changes, standardization initiatives, or when implementing governance policies.

Steps:

Identify batch scope:
- Use list_catalog_assets to find assets matching criteria (e.g., all tables from a source system, all models owned by a team)
- Validate that batch scope is correct (test with small sample first)
Define update template:
- Standardize naming patterns, tags, descriptions, or ownership
- Create template for changes (see references)
- Document change rationale and approval
Execute updates in phases:
- Phase 1: Apply changes to test/sandbox catalogs
- Phase 2: Validate against downstream consumers using lineage
- Phase 3: Apply to production with versioning
- Phase 4: Communicate changes to users
Track and audit changes:
- Log all bulk changes with timestamp, author, and reason
- Enable catalog versioning to support rollback if needed
- Notify affected teams of changes

Best Practices:

Always test bulk updates on a sample first
Use lineage analysis to identify downstream impacts
Batch updates by logical group (not random collections)
Communicate timing and rationale to stakeholders
Provide before/after comparisons for major changes

2. Glossary Term Management

A business glossary is the "source of truth" for terminology. It ensures that "Gross Margin," "EBITDA," and "Market Share" mean the same thing across all teams.

Workflow: Create and Maintain a Business Glossary

When to use: At governance program launch, when onboarding new business domains, or when terminology conflicts arise.

Steps:

Identify core business concepts:
- Interview business owners and analysts
- Review existing reports, dashboards, and analysis
- Document terms with multiple definitions (conflicts to resolve)
- Prioritize high-impact terms (used in multiple models, KPIs, or reports)
Create glossary term entries:
- Use the glossary term template (see references)
- Define each term with business meaning, not technical definition
- Include approved synonyms and related terms
- Document calculation methodology (for metrics)
- Assign owner and approval authority
- Set version and last-reviewed date
Build glossary hierarchy:
- Group terms by business domain (Sales, Finance, HR, Operations)
- Create parent-child relationships (e.g., "Revenue" → "Product Revenue", "Service Revenue")
- Link related terms (see section on glossary relationships)
Enable feedback and evolution:
- Publish draft glossary and collect feedback from stakeholders
- Review conflicts and make approval decisions
- Version published glossary (v1.0, v1.1, etc.)
- Schedule annual reviews with business owners

Best Practices:

Start with 20-30 highest-impact terms, not the entire organization
Involve business owners, not just IT, in definition
Make glossary searchable and always discoverable (don't hide in PDFs)
Include usage examples and anti-examples (what it is NOT)
Document historical changes (why did definition change?)
Link to actual data implementations (models, measures)

Workflow: Link Glossary Terms to Technical Assets

When to use: After glossary terms are approved, during model development, or during metadata enrichment sprints.

Steps:

Identify linking opportunities:
- Use search_catalog to find assets matching glossary terms
- Example: search for "revenue" → find all views, models, measures with revenue-related logic
- Use get_asset_details to inspect calculated fields and measures
Create term-to-asset mappings:
- Link glossary term "Revenue" to measure TOTAL_REVENUE in model Sales_Summary
- Document how technical asset implements the glossary definition
- Capture calculation logic or transformation rules
- Note any deviations or approximations
Enable bidirectional navigation:
- Users viewing glossary term should see which assets implement it
- Users viewing assets should see which glossary terms apply
- Create cross-reference views or dashboards
Validate consistency:
- Check that all uses of the term apply the same definition
- Flag deviations or variant calculations
- Schedule reviews when definitions or implementations change

Best Practices:

One glossary term can map to multiple technical assets (same concept, different contexts)
Document if an asset implements the term exactly or is an approximation
Include transformation rules or calculation logic in the mapping
Update mappings when either glossary terms or asset definitions change
Use mappings to detect duplicate or conflicting implementations

Workflow: Term Approval Workflows

When to use: When implementing formal governance, during terminology disputes, or when adding new glossary terms.

Steps:

Define approval roles:
- Proposer: Business analyst or data owner
- Domain Owner: Accountable for terms in their domain (Sales, Finance, etc.)
- Governance Lead: Final approval authority
- Use RACI matrix (see references) to clarify roles
Create proposal-to-approval workflow:
- Proposer submits term with definition, calculation, and rationale
- Domain owner reviews for alignment with business standards
- Governance lead checks for conflicts, clarity, and compliance
- Feedback is provided; proposer revises if needed
- Final approval records who, when, and rationale
Track approval status:
- Status: Draft → Proposed → Approved → Published
- Escalation path for disputes (which executive resolves conflicts?)
- SLA for reviews (e.g., 5 business days)
Manage versioning:
- When a term definition changes, trigger re-approval
- Previous versions remain available (audit trail)
- Notify users when definitions change

Best Practices:

Clarify approval authority upfront (don't create bottlenecks)
Use lightweight workflow for low-risk terms, formal workflow for KPIs or financial terms
Document why terms were rejected (helps future proposals)
Include legal or compliance review for regulatory terms
Set clear SLAs to prevent indefinite reviews

Workflow: Glossary Hierarchies and Relationships

When to use: As glossary grows beyond 30-50 terms, when standardizing across domains, or when implementing enterprise-wide terminology.

Steps:

Design hierarchical structure:
- Create top-level categories (Business Domains: Finance, Sales, HR, etc.)
- Create sub-categories (Finance → Revenue, Expenses, Assets)
- Create specific terms (Revenue → Product Revenue, Service Revenue)
- Support 2-3 levels of depth (too deep = hard to navigate)
Define relationship types:
- Synonym: Alternative names for the same concept (e.g., "Gross Profit" = "Gross Margin")
- Related: Conceptually connected but distinct (e.g., "Revenue" related to "Cost of Goods Sold")
- Parent-Child: Hierarchical containment (e.g., "Revenue" ← "Product Revenue")
- Derived: One term calculated from others (e.g., "Profit Margin" derived from "Profit" and "Revenue")
Build navigation paths:
- Enable browsing by domain (discover all financial terms)
- Enable searching across domains (find all revenue-related terms)
- Create "Related Terms" suggestions on term detail pages
- Build term dependency maps for KPI validation
Maintain consistency:
- Review hierarchies during governance reviews
- Consolidate synonyms and related terms to reduce duplication
- Update relationships when definitions change

Best Practices:

Don't create deep trees (3+ levels); use relationships instead
Document relationship semantics (what makes two terms "related"?)
Use hierarchies to organize domains, not to create arbitrary classification
Enable free-text search as primary discovery mechanism
Use term relationships to detect definition conflicts

Workflow: Ensure Consistent Terminology Across the Organization

When to use: During governance audits, when merging business units, or when enforcing standards.

Steps:

Audit current terminology:
- Search catalog for variant names and definitions (e.g., "Revenue", "Sales", "Turnover", "Top Line")
- Interview teams to understand why variants exist
- Use search_catalog to find all objects using each variant
- Document conflicts in a consolidation backlog
Resolve conflicts through glossary:
- For each conflict, create a single approved glossary term
- Declare one variant as canonical; others as synonyms
- Document why this definition was chosen
- Get stakeholder approval before enforcement
Enforce consistency:
- Link all variant implementations to approved glossary term
- Update descriptions/names in catalog to use approved terminology
- Add metadata (tags) to identify which implementations are authoritative vs. legacy
- Deprecate non-conforming implementations gradually
Ongoing audits:
- Schedule quarterly reviews of new assets for terminology alignment
- Audit popular models/views for consistent term usage
- Include terminology checklist in data product publishing workflow

Best Practices:

Enforce consistency gradually (phase out old terms over 6-12 months)
Document migration path for teams using old terminology
Recognize that business language evolves; update glossary annually
Use glossary to enforce standards, not to restrict valid language
Support common synonyms as alternate search terms

3. KPI (Key Performance Indicator) Definition

KPIs translate business objectives into measurable metrics. The catalog ensures KPIs are well-defined, validated against data, and linked to accountability.

Workflow: Define KPIs Within the Catalog

When to use: When launching new strategic initiatives, during business planning cycles, or when formalizing informal metrics.

Steps:

Gather KPI requirements:
- Interview executive sponsors and business owners
- Document strategic objective each KPI supports
- Define calculation methodology (detailed formula)
- Identify refresh cadence (daily, weekly, monthly)
- Assign accountability (who owns this KPI?)
- Define target/threshold values
Create KPI definition using template (see references):
- Name: Business-friendly name (e.g., "Customer Lifetime Value")
- Code: Unique identifier (e.g., "CLV_001")
- Strategic Objective: Which business goal does this KPI support?
- Definition: Plain-language description
- Calculation: Detailed formula with logic
- Dimensions: How is KPI sliced? (by customer segment, region, product, time)
- Data Sources: Which tables/models feed this KPI?
- Owner: Who is accountable?
- Review Frequency: When is this KPI reviewed?
- Version: Creation date and change log
Validate against data landscape:
- Use get_asset_details to inspect source tables
- Use analyze_column_distribution to check data availability and quality
- Verify required dimensions/measures exist
- Document any data gaps or approximations
Publish and socialize:
- Create KPI detail page in catalog with calculation visible
- Share KPI definition with stakeholders
- Link to dashboards/reports that use this KPI
- Establish governance (who approves changes?)

Best Practices:

Keep KPI definitions simple; complexity breeds misunderstanding
Include examples: "If X happened, would KPI increase or decrease?"
Document known limitations and caveats (e.g., "excludes international operations")
Version KPI definitions; don't silently change calculations
Link KPI to glossary terms for consistency

Workflow: Link KPIs to Underlying Datasets and Measures

When to use: During KPI validation, when optimizing data models, or when documenting lineage.

Steps:

Map KPI to source measures:
- Use get_object_definition to inspect model structure
- Identify which measures feed each KPI calculation
- Example: KPI "Profit Margin" uses measures Total_Profit and Total_Revenue
- Document transformation logic (if any)
Trace lineage to source systems:
- Use list_catalog_assets or lineage analysis to trace back to source tables
- Document data flow: Source System → ETL → Model → Measure → KPI
- Identify any data transformations or aggregations
- Document refresh timing at each stage
Create bidirectional links:
- KPI detail page shows source measures
- Measure detail page shows which KPIs consume it
- Enable impact analysis: "change this measure → affects these KPIs"
Validate availability and completeness:
- Ensure all required source columns exist
- Check that historical data is available for trending
- Verify refresh frequency supports KPI review cycle
- Document any data quality issues in lineage

Best Practices:

Map each KPI to its smallest constituent measures (enables reuse)
Document assumptions in data flow (e.g., "excludes canceled orders")
Use lineage to identify shared dependencies (optimization opportunities)
Automate lineage updates when data models change
Create data dictionaries linking business metrics to technical measures

Workflow: KPI Ownership and Accountability

When to use: During KPI launch, during governance reviews, or when resolving KPI disputes.

Steps:

Assign clear ownership:
- KPI Owner: Accountable for definition and business interpretation (executive)
- Data Owner: Accountable for underlying data quality (data team)
- Dashboard Owner: Accountable for reporting infrastructure (BI team)
- Use RACI matrix (see references) to clarify secondary responsibilities
Document ownership in catalog:
- Assign owner to KPI definition with contact information
- Create KPI ownership matrix (spreadsheet or dashboard)
- Link KPI to team or department
- Document escalation path for KPI disputes
Enable accountability:
- Schedule monthly KPI reviews with owners
- Track KPI performance trends
- Document explanations when KPIs miss targets
- Use KPI dashboards to highlight performance issues early
Rotate and transition ownership:
- When owner changes roles, assign replacement
- Document transition in KPI versioning
- Provide new owner with calculation documentation and historical context

Best Practices:

Assign single accountable owner (not a committee)
Ensure owner has authority to make decisions about KPI
Connect KPI ownership to performance management/compensation (creates accountability)
Review ownership quarterly; update when roles change
Document succession plan for critical KPI owners

Workflow: KPI Validation

When to use: Before publishing KPIs, during data quality issues, or when results seem suspicious.

Steps:

Validate calculation logic:
- Walk through calculation step-by-step
- Check for logic errors (incorrect operators, filters, aggregations)
- Verify dimensional alignment (are dimensions aggregated correctly?)
- Test with known scenarios (e.g., "if all customers had 100 orders, KPI should be X")
Validate data quality:
- Use analyze_column_distribution on source columns
- Check for missing values, outliers, or data quality issues
- Validate assumptions (e.g., "all dates are in YYYY-MM-DD format")
- Review data freshness: is data current enough for KPI?
Validate against reality:
- Compare KPI results to manual calculations (if available)
- Run KPI on historical data; check for expected trends
- Benchmark against external data if available (e.g., compare "Market Share" KPI to published reports)
- Interview business owners: "does this number feel right?"
Document validation results:
- Create validation report (see references)
- Document any discrepancies and their root causes
- Establish data quality requirements for KPI use
- Define KPI confidence level (trusted, needs monitoring, experimental)
Set up ongoing monitoring:
- Create KPI quality dashboard (shows data freshness, completeness, outliers)
- Set up alerts for data quality issues
- Schedule monthly validation checks
- Document changes to source data that might affect KPI

Best Practices:

Never publish KPI without validation
Include data quality caveats in KPI definition
Validate with business owners, not just data teams
Document validation assumptions (so others can replicate)
Schedule re-validation when source data changes significantly

Workflow: KPI Lifecycle Management

When to use: When KPIs become irrelevant, during business strategy reviews, or when merging business units.

Steps:

Establish KPI lifecycle states:
- Proposed: New KPI being evaluated
- Active: Currently tracked and reviewed
- Monitored: Less critical but still watched
- Deprecated: Phased out or replaced by newer KPI
- Archived: Historically important, no longer used
Transition KPIs through lifecycle:
- Proposed → Active: After validation and stakeholder approval
- Active → Deprecated: When business objective changes or KPI becomes outdated
- Deprecated → Archived: After 6-12 month sunset period
- Document reason and date for each transition
Manage sunset of deprecated KPIs:
- Communicate sunset date to stakeholders well in advance
- Identify replacement KPI (if applicable)
- Provide training on new KPI
- Archive old dashboards/reports gradually
- Keep historical data accessible for trend analysis
Review and refresh KPI portfolio:
- Conduct annual KPI portfolio review
- Assess each KPI: Still aligned with strategy? Still accurate? Still relevant?
- Identify KPIs for deprecation
- Identify new KPIs needed for emerging priorities

Best Practices:

Document why KPIs were deprecated (important context for future teams)
Don't delete KPI definitions; archive them with historical data
Communicate KPI changes to all stakeholders early
Link deprecated KPI to replacement (if applicable)
Review KPI portfolio annually, not ad hoc

4. Tag Management

Tags are lightweight metadata that enable discovery and governance. A well-designed tag taxonomy makes the catalog navigable at scale.

Workflow: Design a Tag Taxonomy

When to use: At governance program launch or when current tagging scheme becomes unwieldy.

Steps:

Define tag categories:
- Domain Tags: Business domain (Finance, Sales, HR, Operations, Product)
- Sensitivity Tags: PII, Confidential, Internal, Public
- Cadence Tags: Real-time, Daily, Weekly, Monthly, Ad-hoc
- Owner/Team Tags: Owned_by_Finance, Owned_by_Sales, etc.
- Quality Tags: Certified, Under_Review, Experimental, Legacy
- Use Case Tags: KPI, Reporting, Analysis, AI/ML, Regulatory
- See references for detailed taxonomy design patterns
Create controlled vocabulary:
- Define each tag with clear definition
- Document when to use each tag (vs. related tags)
- Establish naming convention (lowercase, no spaces, hyphens for compound terms)
- Example: Use hr-employee-master not HR Employee Master or hr_emp_master
Design hierarchy (if needed):
- Flat hierarchy: Simple tag list, good for small catalogs (<100 assets)
- Hierarchical: Parent/child relationships, good for large catalogs (>500 assets)
- Example hierarchy: domain:finance, domain:finance:accounting, domain:finance:revenue
Publish and train:
- Create tag guide with examples
- Train data owners on tagging conventions
- Publish tag definitions in easily searchable location
- Include tag guide in onboarding documentation

Best Practices:

Start simple; expand tags over time
Limit tag count (20-50 active tags); too many defeats discovery
Use domain category heavily; use other categories sparingly
Don't use tags for information that should be in descriptions
Review tags quarterly; consolidate if duplicates emerge
See references for industry-specific tag examples

Workflow: Apply Tags Consistently Across Assets

When to use: During catalog onboarding, in metadata enrichment sprints, or during quality audits.

Steps:

Define tagging standards:
- Which asset types get tagged? (tables, views, models, measures)
- How many tags per asset? (typically 3-5)
- Which tag categories are mandatory? (e.g., domain, sensitivity)
- What approval is needed? (self-service vs. peer review)
Apply tags systematically:
- Use list_catalog_assets to find untagged or under-tagged assets
- Apply domain tag based on asset purpose
- Apply sensitivity tag based on data content (PII, financial, health, etc.)
- Apply cadence tag based on refresh frequency
- Apply quality tag based on readiness level
Document tagging decisions:
- For each asset, document why specific tags were applied
- Include examples in tag definition (to ensure consistent interpretation)
- Review tagging periodically; adjust tags if meanings change
Enable peer review:
- Have data owner review proposed tags
- Tag changes should be visible in audit trail
- Create dashboard showing tagging coverage (% of assets with required tags)

Best Practices:

Tag all new assets before publishing to catalog
Use tags to enforce governance (e.g., all PII data must have sensitivity:pii tag)
Don't over-tag; too many tags reduce usability
Keep tag count consistent (don't tag one asset with 2 tags and another with 20)
Review tags when asset purpose or data changes

Workflow: Tag-Based Search and Discovery

When to use: When enabling self-service analytics or building faceted search interfaces.

Steps:

Enable tag-based filtering:
- Use search_catalog with tag filters
- Support multi-tag searches (e.g., "show all assets tagged domain:finance AND sensitivity:internal")
- Support tag hierarchies in search (search for domain:finance returns all finance sub-tags)
Create tag-based browsing:
- Create tag clouds or faceted navigation in catalog UI
- Enable "related tags" suggestions (if browsing domain:sales, suggest cadence:daily)
- Show tag frequency (how many assets have each tag?)
Enable self-service discovery:
- Train users to search by tag (simpler than writing complex queries)
- Create tag guides for different personas (executives, analysts, engineers)
- Build dashboards that link tags to data discovery metrics
Track usage patterns:
- Monitor which tags are most searched
- Identify unused tags (candidates for removal)
- Use search analytics to refine tagging strategy

Best Practices:

Make tag search as prominent as free-text search
Enable "did you mean" suggestions for similar tags
Show tag-based recommendations (users viewing asset with tag X also viewed assets with tag Y)
Use tag search to validate tagging strategy (if tags aren't searched, reconsider their value)
Create saved tag searches for common discovery patterns

Workflow: Tag Governance

When to use: When tagging decisions affect multiple teams, when enforcing compliance, or when implementing self-service governance.

Steps:

Define tagging authority:
- Who can create new tags? (centralized: only governance team; decentralized: domain teams with approval)
- Who can apply tags? (any asset owner; or only owners of sensitive assets)
- Who can modify or delete tags? (governance team only)
- Document process in governance policy
Implement approval workflows (if needed):
- For sensitive tags (PII, Financial) or compliance tags, require approval
- Asset owner proposes tags; governance team reviews and approves
- Approval captures who approved and timestamp
- Rejected tags include feedback for why
Monitor tagging compliance:
- Create dashboard showing tagging coverage by domain/team
- Flag assets missing required tags (domain, sensitivity)
- Monthly tagging audit: review high-change assets
- Identify teams with inconsistent tagging practices
Enforce tagging standards:
- Prevent publishing of assets without required tags
- Create alerts for untagged or mis-tagged sensitive assets
- Include tagging checklist in data product publishing workflow
- Tie tagging compliance to team scorecards (if appropriate)

Best Practices:

Keep tagging lightweight; heavy approval workflows reduce adoption
Tag sensitive assets with higher approval rigor
Enable bulk tag application (don't require tagging each asset individually)
Provide clear feedback when tagging violates standards
Review tagging governance quarterly

5. Lineage and Impact Analysis

Understanding data lineage answers critical questions: "Where does this data come from?" "Which models consume this table?" "What breaks if I change this column?"

Workflow: Use Impact and Lineage Analysis to Trace Data Flows

When to use: Before modifying critical tables, during root cause analysis of data issues, or when documenting data flows.

Steps:

Access lineage tools:
- Use get_object_definition to inspect asset structure
- Use list_catalog_assets with lineage context to identify related assets
- Access Datasphere lineage visualization (typically in asset detail page)
- Filter lineage by direction (upstream, downstream, bidirectional)
Trace upstream lineage:
- Start from asset of interest (e.g., a model or measure)
- Follow lineage upstream to source tables
- Identify transformations at each step
- Document assumptions and business logic in transformations
- Example: Model_Sales → View_Sales_Orders → Table_ORDERS (SAP source)
Trace downstream lineage:
- Start from table or view
- Follow lineage downstream to consuming models, measures, KPIs
- Identify all downstream impacts (critical for change assessment)
- Example: Table_CUSTOMERS → View_Customer_Enriched → Model_Customer_Analytics → KPI_Churn_Rate
Analyze bidirectional flows:
- Identify circular dependencies (should be rare)
- Find shared data flows (tables consumed by multiple models)
- Identify choke points (tables with many downstream consumers)
- Document data flow bottlenecks
Document lineage:
- Create lineage diagram showing major data flows
- Document business logic at each transformation
- Include metadata about refresh timing
- Publish documentation with visual lineage

Best Practices:

Lineage should be automatically captured; manually document only complex logic
Include transformation rationale (not just "what" but "why")
Document data quality changes through lineage (where does quality degrade?)
Review lineage when source systems change
Use lineage to identify optimization opportunities (redundant transformations, etc.)

Workflow: Generate Impact Reports

When to use: Before making changes to critical assets, during root cause analysis, or during governance audits.

Steps:

Define change scope:
- Identify specific table, column, model, or measure being changed
- Document nature of change (delete column, rename, change calculation, deprecate table)
- Estimate impact scope (how many downstream assets affected?)
- Assess impact severity (internal tools only vs. customer-facing dashboards)
Generate impact analysis:
- Use lineage tools to identify all downstream consumers
- Classify consumers by impact type:
  - Direct: Assets directly consuming the changed object
  - Indirect: Assets consuming direct consumers
  - KPI: KPIs affected by change
  - Reports/Dashboards: BI artifacts consuming changed assets
- Identify affected stakeholders (which teams will feel impact?)
Assess impact severity:
- For each impacted asset, assess:
  - Criticality: Is this a critical KPI? Customer-facing? Regulatory?
  - Detectability: Would broken data be noticed immediately or silently wrong?
  - Blast Radius: How many end users affected?
- Document each assessment with rationale
Create impact report:
- Change Summary: What is being changed and why?
- Downstream Impacts: List all affected assets with severity
- Mitigation Plans: How to minimize impact? (e.g., phased rollout, temporary shadow calculation)
- Testing Plan: How to validate change doesn't break downstream?
- Rollback Plan: How to revert if issue discovered?
- Stakeholder Notifications: Which teams need to be informed?
Use report to gain approvals:
- Share impact report with affected stakeholders
- Get sign-off from asset owners before proceeding
- Document approval (who approved, when, any conditions)
- Update report as approvals gathered

Best Practices:

Always generate impact report before modifying critical assets
Use impact reports to surface hidden downstream dependencies
Include indirect impacts (sometimes more important than direct)
Define severity thresholds (when is impact too high?)
Use impact analysis to identify opportunities to consolidate redundant implementations

Workflow: Upstream Analysis

When to use: When investigating data quality issues, understanding data freshness, or documenting data sources.

Steps:

Trace back to original sources:
- Start from asset with issue (model, measure, or dashboard)
- Follow lineage upstream to source tables
- Identify each transformation step
- Document data quality at each stage
Identify source systems:
- For each upstream table, identify source system (SAP, Salesforce, custom app, etc.)
- Document extraction frequency (real-time, batch, delayed)
- Identify data quality issues in source (missing values, duplicates, delays)
Analyze data quality degradation:
- Identify where data quality issues are introduced
- Example: "Missing Customer Names in source → not populated in enriched view → shows as blanks in dashboard"
- Document which transformations impact quality
Identify opportunities to improve:
- Use upstream analysis to fix quality issues at source (better than downstream workarounds)
- Identify redundant transformations that could be consolidated
- Propose moving transformations closer to source (for efficiency)

Best Practices:

Document source system characteristics (reliability, update frequency)
Use upstream analysis to identify data quality root causes
Periodically review upstream dependencies; document changes
Build dashboards tracking data freshness across lineage
Work with source system owners to improve data quality upstream

Workflow: Downstream Analysis

When to use: When deprecating assets, understanding asset usage, or during data quality investigations.

Steps:

Identify all downstream consumers:
- Start from table, view, or measure
- Trace forward to consuming models, measures, KPIs
- Continue tracing to dashboards, reports, or AI/ML models
- Use list_catalog_assets to identify all references
Categorize downstream usage:
- Critical: KPIs, customer-facing dashboards, regulatory reports
- Important: Internal dashboards, analyst-used models, operational reports
- Experimental: Prototype dashboards, one-time analyses
- Assess what happens if asset becomes unavailable
Understand consumption patterns:
- Which downstream assets are actively used?
- Which downstream assets are stale/unused? (candidates for cleanup)
- Which measures appear in multiple KPIs? (indicates high leverage)
- Which dashboards have most users? (highest-risk to break)
Plan changes safely:
- For deprecation: Provide replacement assets before turning off original
- For modifications: Test changes against actual downstream consumers
- Communicate changes early to downstream owners
- Build deprecation timeline that allows downstream adjustment

Best Practices:

Use downstream analysis to understand asset criticality
Identify "hidden" downstream consumers (often forgotten dependencies)
Use downstream usage patterns to prioritize data governance efforts
Periodically clean up unused downstream assets (reduces technical debt)
Track downstream usage to validate data product adoption

Workflow: Change Impact Assessment Before Modifications

When to use: Before modifying any production asset, during data quality fixes, or during optimization projects.

Steps:

Plan change:
- Document what is being changed (table structure, calculation logic, assumptions)
- Document why change is needed (bug fix, performance, business requirement)
- Document expected benefits and risks
Analyze downstream impacts:
- Use downstream analysis workflow (above) to identify all consumers
- Generate impact report (see above workflow)
- Identify critical vs. non-critical impacts
- Document stakeholders who need notification
Design safe change approach:
- Option 1 (No-Impact Approach): Add new column/table; don't modify existing ones
- Option 2 (Backward-Compatible): Support old and new logic simultaneously during transition
- Option 3 (Phased Rollout): Change in phases; monitor for issues at each phase
- Option 4 (Temporary Shadow): Run new logic alongside old; validate before switching
- Choose approach based on risk and downstream impact
Build testing plan:
- Test change in development/test environment first
- Test with actual data volumes and realistic downstream consumption patterns
- Validate that downstream assets still produce correct results
- Document test cases and results
Execute with monitoring:
- Apply change (in phases if using phased rollout)
- Monitor downstream dashboards/reports for unexpected changes
- Monitor data quality metrics (completeness, accuracy, freshness)
- Have rollback plan ready
- Document any issues discovered during rollout
Communicate and document:
- Notify downstream stakeholders of change and any impacts
- Document change in asset versioning/changelog
- Update lineage documentation if data flows changed
- Conduct post-mortem if issues discovered

Best Practices:

Always analyze impact before change; never assume "no one will notice"
Use phased or shadow approaches for high-risk changes
Build automated tests to validate downstream assets after changes
Keep rollback capability for at least 24-48 hours after change
Review impact assessment process regularly; improve based on incidents

6. Data Quality Scoring and Tracking

Data quality scores help users select trustworthy datasets and identify improvement opportunities.

Workflow: Define Quality Dimensions and Scoring

When to use: During governance program launch or when implementing data quality initiatives.

Steps:

Identify quality dimensions:
- Completeness: Are all required values present? (inverse of missing/null rates)
- Accuracy: Do values match source of truth or business rules?
- Timeliness: Is data fresh? How long since last update?
- Consistency: Do related values align? (e.g., sum of parts = total)
- Uniqueness: Are there unintended duplicates?
- Validity: Do values match expected format/range?
- See references for detailed scoring templates
Define scoring methodology:
- For each dimension, establish measurement logic
- Example Completeness: "Score = 100 * (non-null rows / total rows)"
- Example Timeliness: "Score = 100 if updated in last 24 hours; decreases 5 points per day stale"
- Document assumptions and edge cases
- Establish minimum thresholds (e.g., Completeness must be ≥95%)
Aggregate scores:
- Calculate overall quality score from dimension scores
- Use weighted average if some dimensions more important (see references for templates)
- Example: Overall = (Completeness * 30% + Accuracy * 30% + Timeliness * 25% + Consistency * 15%)
- Establish overall quality tiers: Certified (90+), Trusted (80-89), Monitor (70-79), Issue (<70)
Establish scorecard template:
- Create quality scorecard showing all dimensions
- Include trends over time (are we improving or degrading?)
- Document current blockers to achieving higher scores
- Set improvement targets

Best Practices:

Start with simple dimensions (completeness, timeliness); add advanced dimensions over time
Base scores on automated measurements when possible (avoid manual scoring)
Review scoring methodology quarterly; update if dimensions/thresholds change
Communicate quality scores to all data consumers
Link quality issues to root causes (helps with remediation)

Workflow: Score Assets on Quality Metrics

When to use: During metadata enrichment, when onboarding new data sources, or in quality audits.

Steps:

Measure quality dimensions:
- Use analyze_column_distribution to assess completeness (what % of rows have values?)
- Check timeliness (when was data last refreshed?)
- Run validation rules to assess accuracy (do values meet business rules?)
- Compare with source to assess consistency
Calculate quality scores:
- For each dimension, calculate score using methodology from above
- Aggregate dimension scores into overall quality tier
- Document any assumptions or manual overrides
- Identify root causes of low scores
Assign quality tags:
- Tag assets with quality tier (Certified, Trusted, Monitor, Issue)
- Tag assets with specific quality issues (duplicate_data, stale_data, missing_values, etc.)
- Use tags to surface quality issues in catalog search
Publish quality metadata:
- Add quality score and dimensions to asset detail page
- Create quality dashboard showing scores across portfolio
- Enable sorting/filtering by quality score
- Show trend charts (is quality improving over time?)

Best Practices:

Automate quality scoring; don't rely on manual assessments
Review quality scores weekly or monthly (not annually)
Link quality issues to improvement projects (make it actionable)
Highlight quick wins (easy-to-fix quality issues)
Recognize teams that improve data quality

Workflow: Quality Dashboards and Trending

When to use: When establishing quality culture, during quality improvement programs, or for executive visibility.

Steps:

Build quality scorecards:
- Create dashboard showing quality scores across all assets
- Show breakdown by domain (Finance, Sales, HR, etc.)
- Show breakdown by source system or team
- Display as heatmap or scorecard format
Visualize trends over time:
- Track quality score trends (improving or degrading?)
- Identify tables with declining quality (investigate why)
- Celebrate tables with improving quality (recognize teams)
- Use trends to justify investment in quality initiatives
Enable drill-down analysis:
- Click table to see detailed quality metrics (dimension scores)
- See which columns are causing low quality
- View quality issues identified in validation rules
- Link to remediation projects or tickets
Use for accountability:
- Assign quality scorecards to team owners
- Monthly review of team's quality scorecard
- Tie quality improvements to performance goals
- Use quality metrics in hiring/promotion decisions (if appropriate)

Best Practices:

Make quality visible to all users (public dashboard, not hidden)
Set realistic improvement targets (don't expect 100% overnight)
Celebrate improvements; don't just criticize poor quality
Link quality metrics to business impact (show cost/risk of poor quality)
Review quality dashboards at least monthly

7. Catalog Review Workflows

Regular catalog reviews prevent stale data, ensure accurate metadata, and maintain governance standards.

Workflow: Periodic Asset Review Scheduling

When to use: To establish ongoing governance cadence.

Steps:

Define review schedule:
- Critical Assets: Monthly review (KPIs, customer-facing models, regulatory data)
- Important Assets: Quarterly review (heavily-used dashboards, core measures)
- Standard Assets: Annual review (everything else)
- Establish review calendar with assigned owners
Create review checklist:
- Does the asset meet current business needs?
- Is metadata up-to-date (name, description, tags)?
- Is quality acceptable? Any known issues?
- Is asset actively used? By whom?
- Is ownership clear and current?
- Are there any deprecated or redundant assets to clean up?
Conduct reviews:
- Send review request to asset owner with checklist
- Owner reviews and confirms or updates metadata
- Owner identifies any issues or improvement opportunities
- Governance team follows up on unfinished reviews
Track and report:
- Monitor review completion rates
- Report on findings (common issues, needed improvements)
- Create action items for improvements identified
- Schedule follow-up reviews for problematic assets

Best Practices:

Keep review lightweight; 5-minute checklist is better than hour-long review
Automate notifications and tracking (don't rely on email)
Make reviews part of team's regular cadence (e.g., first Friday of month)
Recognize teams with high-quality catalogs
Use review findings to improve governance processes

Workflow: Stale Asset Identification and Cleanup

When to use: During governance audits or when trying to reduce catalog clutter.

Steps:

Identify stale assets:
- Use list_catalog_assets to find assets with:
  - No recent updates (e.g., not modified in 1+ year)
  - No recent usage (e.g., not consumed by any dashboards/models)
  - No owner or owner no longer in organization
- Flag assets as potentially stale
- Create stale asset inventory
Investigate stale assets:
- For each stale asset, determine why it's not used
- Is it truly unused? Or is usage not tracked?
- Use downstream analysis to check for hidden dependencies
- Interview potential users (is this asset still needed?)
Plan consolidation or deprecation:
- Option 1: Consolidate with similar active asset (reduce duplication)
- Option 2: Deprecate with migration path to replacement asset
- Option 3: Archive with clear documentation (in case needed in future)
- Option 4: Delete if truly redundant and no users identified
- Get stakeholder approval before action
Execute cleanup:
- For deprecation: Communicate timeline, provide replacement access
- For consolidation: Migrate any remaining users to replacement asset
- For archival: Move to archive location, keep documentation accessible
- For deletion: Only after confirming no users/lineage dependencies
Monitor and report:
- Track cleanup progress and completed actions
- Report on amount of technical debt removed
- Celebrate catalog cleanliness improvements
- Establish policies to prevent stale assets from accumulating again

Best Practices:

Don't delete without confirming no users (hidden dependencies surprise you)
Keep archived assets documented; don't just disappear them
Communicate deprecations early; give long lead time (6+ months)
Use cleanup projects to establish ongoing maintenance culture
Review and approve stale asset cleanup at governance committee level

Workflow: Ownership Assignment and Accountability

When to use: During onboarding, when ownership gaps identified, or during governance reviews.

Steps:

Define ownership model:
- Technical Owner: Responsible for data model, refresh, quality
- Business Owner: Responsible for business interpretation, accuracy
- Executive Sponsor: Accountable for strategic alignment
- Document roles and responsibilities in RACI matrix (see references)
Identify ownership gaps:
- Use list_catalog_assets to find assets without assigned owner
- Create ownership inventory (asset → assigned owner)
- Identify teams with too much ownership (capacity issues)
- Identify gaps where ownership unclear
Assign ownership:
- Match assets to appropriate owners based on:
  - Team responsible for data model
  - Team most familiar with business context
  - Team consuming data most heavily
- Get owner approval before assigning
- Document escalation path if owner unavailable
Enable accountability:
- Use ownership assignments to route review requests
- Track owner response rates and quality of reviews
- Recognize owners with high-quality asset governance
- Provide support/training to struggling owners
Maintain ownership:
- Review ownership quarterly
- Update when owners change roles
- Establish succession planning for critical asset owners
- Document owner transitions with knowledge transfer

Best Practices:

Assign single owner (not committees); clarifies accountability
Ensure owner has time/authority to manage asset (avoid overloading)
Rotate ownership periodically (prevents siloing of knowledge)
Provide owners with tools and dashboards to manage their assets
Tie ownership to performance reviews/compensation (creates accountability)

MCP Tools Reference

This skill leverages these Datasphere MCP tools:

search_catalog - Search catalog by name, description, or metadata; filter by type, domain, tag
get_asset_details - Retrieve full metadata for table, view, model, or measure (structure, lineage, ownership)
list_catalog_assets - List assets matching criteria (type, owner, status, tag); supports pagination
search_repository - Search source system definitions and imported objects
get_object_definition - Retrieve detailed definition of object (structure, calculations, lineage)
get_deployed_objects - List deployed models/measures and their status
analyze_column_distribution - Analyze column data types, cardinality, completeness, distributions

Best Practices Summary

Metadata Enrichment:

Use business terminology, not technical jargon
Keep descriptions under 500 words; link to detailed docs
Auto-suggest tags; require human review before applying
Batch updates by logical group; test on samples first

Glossary Management:

Start with 20-30 high-impact terms; grow over time
Involve business owners in definitions
Link glossary terms to actual technical implementations
Version glossary; communicate changes to stakeholders
Resolve terminology conflicts through formal approval process

KPI Definition:

Define calculation logic clearly; include examples
Validate KPI against underlying data before publishing
Assign single owner; document accountability
Review KPI portfolio annually for relevance
Version KPI definitions; don't silently change calculations

Tag Management:

Design controlled vocabulary; limit active tags to 20-50
Apply tags consistently across all assets
Use tags for governance enforcement (required tags for sensitive data)
Review and consolidate tags quarterly

Lineage & Impact:

Always analyze downstream impact before modifying critical assets
Use impact reports to gain stakeholder approval for changes
Identify and consolidate redundant implementations
Build automated tests to validate changes against downstream consumers

Data Quality:

Automate quality scoring; avoid manual assessments
Make quality scores visible to all users
Link quality issues to remediation projects
Review quality metrics monthly; celebrate improvements

Catalog Reviews:

Keep reviews lightweight (5-minute checklist)
Conduct critical asset reviews monthly; standard assets annually
Identify and clean up stale assets regularly
Maintain ownership assignments; rotate periodically

Common Anti-Patterns and Solutions

Anti-Pattern: Metadata written for IT, not business

Solution: Use business analyst as template reviewer; remove jargon

Anti-Pattern: Too many tags; users can't navigate

Solution: Consolidate to 20-30 core tags; deprecate duplicates

Anti-Pattern: KPI definitions silently change (breaks downstream calculations)

Solution: Version KPI definitions; communicate changes; validate impact

Anti-Pattern: Assets with no owner; governance unenforceable

Solution: Systematically assign owners; include ownership in publishing workflow

Anti-Pattern: Quality issues discovered downstream; no visibility upstream

Solution: Build quality dashboards; surface issues early; tie to remediation projects

Anti-Pattern: Lineage not captured; impact analysis impossible

Solution: Ensure lineage automatically captured from data models; manually document complex logic

Anti-Pattern: Glossary becomes unmaintainable; conflicting definitions

Solution: Implement formal approval workflow; version terms; resolve conflicts through governance

Integration with Data Product Publishing

The Catalog Steward skill complements the Data Product Publisher skill:

Catalog Steward: Organizes internal repository (metadata, quality, lineage, governance)
Data Product Publisher: Publishes curated products to external marketplace

Before publishing a data product, use Catalog Steward to:

Ensure all source assets have clear ownership and quality certification
Validate glossary terms and KPI definitions
Verify lineage and impact analysis (understand ripple effects)
Establish quality SLAs for published product
Assign business/technical owners responsible for product quality

Getting Started

Audit Current State:
- Use list_catalog_assets to inventory all assets
- Use get_asset_details to assess metadata quality
- Identify highest-value governance improvements
Design Your Governance Model:
- Define roles and responsibilities (RACI)
- Design tag taxonomy and glossary structure
- Establish quality scoring methodology
Execute Pilot Project:
- Pick 1-2 business domains for pilot
- Enrich metadata, add glossary terms, implement tagging
- Build sample quality dashboard
- Get stakeholder feedback and refine approach
Scale Governance Program:
- Extend to additional domains
- Build automation for metadata enrichment, quality scoring, lineage capture
- Establish review cadences and ownership assignments
- Train data owners and catalog curators
Measure and Optimize:
- Track catalog usage metrics (searches, views, discovery patterns)
- Monitor governance compliance (tagging, quality, ownership)
- Conduct quarterly reviews; adjust policies based on feedback
- Celebrate wins; recognize teams driving adoption

Similar Skills

using-git-worktrees

169.2k

Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.

superpowers

subagent-driven-development

169.2k

Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.

3 files

superpowers

dispatching-parallel-agents

169.2k

Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.

superpowers

Stats

Stars12

Forks3

Last CommitFeb 9, 2026

Actions

View Source View Plugin View on GitHub View README

Catalog Steward Skill

Overview

Why Catalog Governance Matters

Self-Service Analytics: Business users can find and trust data without submitting tickets
Compliance & Auditability: Clear lineage and ownership trails support regulatory requirements
Impact Analysis: Understand change ripple effects before modifying critical assets
Terminology Alignment: Glossaries ensure "Revenue" means the same thing across teams
Data Quality Transparency: Quality scores help users select the right datasets
Governance at Scale: Consistent metadata reduces technical debt and tribal knowledge

Core Workflows

1. Metadata Enrichment

Metadata enrichment transforms technical asset names and sparse descriptions into discoverable, business-friendly documentation.

Workflow: Analyze and Suggest Business-Friendly Names

When to use: During onboarding, after importing source system tables, or during catalog cleanup sprints.

Steps:

Search for undernamed assets:
- Use search_catalog to find tables/views with missing or cryptic names (e.g., "T_SALES_001")
- Filter by asset type (Dimension, Fact, View, Model)
- Identify candidates for enrichment
Analyze content with column inspection:
- Use get_asset_details to inspect table/view structure
- Review key columns to infer business meaning
- Identify primary dimensions and measures
- Example: "T_SALES_001" contains CUST_ID, ORDER_DT, AMOUNT → suggests "Customer Orders Fact"
Suggest and apply business names:
- Map technical names to business-friendly alternatives
- Follow naming conventions (see references for templates)
- Apply updated names via catalog metadata endpoints
- Document rationale in internal notes

Best Practices:

Include plural nouns for fact tables, singular for dimensions
Use business domain terminology (not IT jargon)
Avoid ambiguity: "Sales" → "Monthly Sales Orders" or "Daily Sales Revenue"
Create a naming convention document and version it

Workflow: Write Meaningful Descriptions

When to use: When onboarding new users, before publishing catalog assets, or during quality audits.

Steps:

Gather context:
- Use get_asset_details to extract technical metadata
- Review related objects (upstream sources, downstream consumers)
- Identify responsible team or owner
Write descriptions following a template:
- What: One-sentence summary of what the asset contains
- Why: Business purpose or use case
- Key columns: 2-3 most important dimensions/measures
- Refresh frequency: How often is it updated
- Caveats: Data quality issues, exclusions, or limitations
- Example template (see references)
Link to upstream sources:
- Document source systems or parent tables
- Use get_object_definition to trace lineage
- Include transformation logic (if relevant)
Review and version:
- Have data owner approve descriptions
- Track description changes in catalog versioning

Best Practices:

Keep descriptions under 500 words; link to detailed documentation elsewhere
Use plain language; assume audience is business analyst (not DBA)
Include examples of typical queries or use cases
Flag experimental or deprecated assets clearly
Update descriptions when business meaning changes (not just when data structure changes)

Workflow: Auto-Suggest Tags Based on Content Analysis

When to use: During bulk catalog onboarding or when implementing a new tag taxonomy.

Steps:

Analyze asset content:
- Use get_asset_details to inspect column names, types, and distributions
- Use analyze_column_distribution to understand data characteristics
- Identify data types (financial, HR, product, customer, operational)
- Detect common patterns (dates, IDs, amounts)
Match against tag taxonomy:
- Map identified characteristics to your tag taxonomy (see references)
- Example: columns contain "SALARY", "EMPLOYEE_ID" → suggest tags: hr, sensitive, employee-master
Propose tags with confidence scoring:
- High confidence: tags match multiple column patterns
- Medium confidence: tags match domain or naming conventions
- Low confidence: tags are contextual or require human review
Review and apply:
- Present suggestions with reasoning
- Allow manual override for edge cases
- Batch-apply approved tags

Best Practices:

Use a controlled vocabulary (see tag taxonomy in references)
Combine multiple tag types (domain, sensitivity, cadence, owner)
Review auto-suggestions; don't apply blindly
Document why assets receive specific tags
Update tags when asset usage patterns change

Workflow: Bulk Metadata Updates Across Multiple Assets

When to use: After organizational changes, standardization initiatives, or when implementing governance policies.

Steps:

Identify batch scope:
- Use list_catalog_assets to find assets matching criteria (e.g., all tables from a source system, all models owned by a team)
- Validate that batch scope is correct (test with small sample first)
Define update template:
- Standardize naming patterns, tags, descriptions, or ownership
- Create template for changes (see references)
- Document change rationale and approval
Execute updates in phases:
- Phase 1: Apply changes to test/sandbox catalogs
- Phase 2: Validate against downstream consumers using lineage
- Phase 3: Apply to production with versioning
- Phase 4: Communicate changes to users
Track and audit changes:
- Log all bulk changes with timestamp, author, and reason
- Enable catalog versioning to support rollback if needed
- Notify affected teams of changes

Best Practices:

Always test bulk updates on a sample first
Use lineage analysis to identify downstream impacts
Batch updates by logical group (not random collections)
Communicate timing and rationale to stakeholders
Provide before/after comparisons for major changes

2. Glossary Term Management

A business glossary is the "source of truth" for terminology. It ensures that "Gross Margin," "EBITDA," and "Market Share" mean the same thing across all teams.

Workflow: Create and Maintain a Business Glossary

When to use: At governance program launch, when onboarding new business domains, or when terminology conflicts arise.

Steps:

Identify core business concepts:
- Interview business owners and analysts
- Review existing reports, dashboards, and analysis
- Document terms with multiple definitions (conflicts to resolve)
- Prioritize high-impact terms (used in multiple models, KPIs, or reports)
Create glossary term entries:
- Use the glossary term template (see references)
- Define each term with business meaning, not technical definition
- Include approved synonyms and related terms
- Document calculation methodology (for metrics)
- Assign owner and approval authority
- Set version and last-reviewed date
Build glossary hierarchy:
- Group terms by business domain (Sales, Finance, HR, Operations)
- Create parent-child relationships (e.g., "Revenue" → "Product Revenue", "Service Revenue")
- Link related terms (see section on glossary relationships)
Enable feedback and evolution:
- Publish draft glossary and collect feedback from stakeholders
- Review conflicts and make approval decisions
- Version published glossary (v1.0, v1.1, etc.)
- Schedule annual reviews with business owners

Best Practices:

Start with 20-30 highest-impact terms, not the entire organization
Involve business owners, not just IT, in definition
Make glossary searchable and always discoverable (don't hide in PDFs)
Include usage examples and anti-examples (what it is NOT)
Document historical changes (why did definition change?)
Link to actual data implementations (models, measures)

Workflow: Link Glossary Terms to Technical Assets

When to use: After glossary terms are approved, during model development, or during metadata enrichment sprints.

Steps:

Identify linking opportunities:
- Use search_catalog to find assets matching glossary terms
- Example: search for "revenue" → find all views, models, measures with revenue-related logic
- Use get_asset_details to inspect calculated fields and measures
Create term-to-asset mappings:
- Link glossary term "Revenue" to measure TOTAL_REVENUE in model Sales_Summary
- Document how technical asset implements the glossary definition
- Capture calculation logic or transformation rules
- Note any deviations or approximations
Enable bidirectional navigation:
- Users viewing glossary term should see which assets implement it
- Users viewing assets should see which glossary terms apply
- Create cross-reference views or dashboards
Validate consistency:
- Check that all uses of the term apply the same definition
- Flag deviations or variant calculations
- Schedule reviews when definitions or implementations change

Best Practices:

One glossary term can map to multiple technical assets (same concept, different contexts)
Document if an asset implements the term exactly or is an approximation
Include transformation rules or calculation logic in the mapping
Update mappings when either glossary terms or asset definitions change
Use mappings to detect duplicate or conflicting implementations

Workflow: Term Approval Workflows

When to use: When implementing formal governance, during terminology disputes, or when adding new glossary terms.

Steps:

Define approval roles:
- Proposer: Business analyst or data owner
- Domain Owner: Accountable for terms in their domain (Sales, Finance, etc.)
- Governance Lead: Final approval authority
- Use RACI matrix (see references) to clarify roles
Create proposal-to-approval workflow:
- Proposer submits term with definition, calculation, and rationale
- Domain owner reviews for alignment with business standards
- Governance lead checks for conflicts, clarity, and compliance
- Feedback is provided; proposer revises if needed
- Final approval records who, when, and rationale
Track approval status:
- Status: Draft → Proposed → Approved → Published
- Escalation path for disputes (which executive resolves conflicts?)
- SLA for reviews (e.g., 5 business days)
Manage versioning:
- When a term definition changes, trigger re-approval
- Previous versions remain available (audit trail)
- Notify users when definitions change

Best Practices:

Clarify approval authority upfront (don't create bottlenecks)
Use lightweight workflow for low-risk terms, formal workflow for KPIs or financial terms
Document why terms were rejected (helps future proposals)
Include legal or compliance review for regulatory terms
Set clear SLAs to prevent indefinite reviews

Workflow: Glossary Hierarchies and Relationships

When to use: As glossary grows beyond 30-50 terms, when standardizing across domains, or when implementing enterprise-wide terminology.

Steps:

Design hierarchical structure:
- Create top-level categories (Business Domains: Finance, Sales, HR, etc.)
- Create sub-categories (Finance → Revenue, Expenses, Assets)
- Create specific terms (Revenue → Product Revenue, Service Revenue)
- Support 2-3 levels of depth (too deep = hard to navigate)
Define relationship types:
- Synonym: Alternative names for the same concept (e.g., "Gross Profit" = "Gross Margin")
- Related: Conceptually connected but distinct (e.g., "Revenue" related to "Cost of Goods Sold")
- Parent-Child: Hierarchical containment (e.g., "Revenue" ← "Product Revenue")
- Derived: One term calculated from others (e.g., "Profit Margin" derived from "Profit" and "Revenue")
Build navigation paths:
- Enable browsing by domain (discover all financial terms)
- Enable searching across domains (find all revenue-related terms)
- Create "Related Terms" suggestions on term detail pages
- Build term dependency maps for KPI validation
Maintain consistency:
- Review hierarchies during governance reviews
- Consolidate synonyms and related terms to reduce duplication
- Update relationships when definitions change

Best Practices:

Don't create deep trees (3+ levels); use relationships instead
Document relationship semantics (what makes two terms "related"?)
Use hierarchies to organize domains, not to create arbitrary classification
Enable free-text search as primary discovery mechanism
Use term relationships to detect definition conflicts

Workflow: Ensure Consistent Terminology Across the Organization

When to use: During governance audits, when merging business units, or when enforcing standards.

Steps:

Audit current terminology:
- Search catalog for variant names and definitions (e.g., "Revenue", "Sales", "Turnover", "Top Line")
- Interview teams to understand why variants exist
- Use search_catalog to find all objects using each variant
- Document conflicts in a consolidation backlog
Resolve conflicts through glossary:
- For each conflict, create a single approved glossary term
- Declare one variant as canonical; others as synonyms
- Document why this definition was chosen
- Get stakeholder approval before enforcement
Enforce consistency:
- Link all variant implementations to approved glossary term
- Update descriptions/names in catalog to use approved terminology
- Add metadata (tags) to identify which implementations are authoritative vs. legacy
- Deprecate non-conforming implementations gradually
Ongoing audits:
- Schedule quarterly reviews of new assets for terminology alignment
- Audit popular models/views for consistent term usage
- Include terminology checklist in data product publishing workflow

Best Practices:

Enforce consistency gradually (phase out old terms over 6-12 months)
Document migration path for teams using old terminology
Recognize that business language evolves; update glossary annually
Use glossary to enforce standards, not to restrict valid language
Support common synonyms as alternate search terms

3. KPI (Key Performance Indicator) Definition

KPIs translate business objectives into measurable metrics. The catalog ensures KPIs are well-defined, validated against data, and linked to accountability.

Workflow: Define KPIs Within the Catalog

When to use: When launching new strategic initiatives, during business planning cycles, or when formalizing informal metrics.

Steps:

Gather KPI requirements:
- Interview executive sponsors and business owners
- Document strategic objective each KPI supports
- Define calculation methodology (detailed formula)
- Identify refresh cadence (daily, weekly, monthly)
- Assign accountability (who owns this KPI?)
- Define target/threshold values
Create KPI definition using template (see references):
- Name: Business-friendly name (e.g., "Customer Lifetime Value")
- Code: Unique identifier (e.g., "CLV_001")
- Strategic Objective: Which business goal does this KPI support?
- Definition: Plain-language description
- Calculation: Detailed formula with logic
- Dimensions: How is KPI sliced? (by customer segment, region, product, time)
- Data Sources: Which tables/models feed this KPI?
- Owner: Who is accountable?
- Review Frequency: When is this KPI reviewed?
- Version: Creation date and change log
Validate against data landscape:
- Use get_asset_details to inspect source tables
- Use analyze_column_distribution to check data availability and quality
- Verify required dimensions/measures exist
- Document any data gaps or approximations
Publish and socialize:
- Create KPI detail page in catalog with calculation visible
- Share KPI definition with stakeholders
- Link to dashboards/reports that use this KPI
- Establish governance (who approves changes?)

Best Practices:

Keep KPI definitions simple; complexity breeds misunderstanding
Include examples: "If X happened, would KPI increase or decrease?"
Document known limitations and caveats (e.g., "excludes international operations")
Version KPI definitions; don't silently change calculations
Link KPI to glossary terms for consistency

Workflow: Link KPIs to Underlying Datasets and Measures

When to use: During KPI validation, when optimizing data models, or when documenting lineage.

Steps:

Map KPI to source measures:
- Use get_object_definition to inspect model structure
- Identify which measures feed each KPI calculation
- Example: KPI "Profit Margin" uses measures Total_Profit and Total_Revenue
- Document transformation logic (if any)
Trace lineage to source systems:
- Use list_catalog_assets or lineage analysis to trace back to source tables
- Document data flow: Source System → ETL → Model → Measure → KPI
- Identify any data transformations or aggregations
- Document refresh timing at each stage
Create bidirectional links:
- KPI detail page shows source measures
- Measure detail page shows which KPIs consume it
- Enable impact analysis: "change this measure → affects these KPIs"
Validate availability and completeness:
- Ensure all required source columns exist
- Check that historical data is available for trending
- Verify refresh frequency supports KPI review cycle
- Document any data quality issues in lineage

Best Practices:

Map each KPI to its smallest constituent measures (enables reuse)
Document assumptions in data flow (e.g., "excludes canceled orders")
Use lineage to identify shared dependencies (optimization opportunities)
Automate lineage updates when data models change
Create data dictionaries linking business metrics to technical measures

Workflow: KPI Ownership and Accountability

When to use: During KPI launch, during governance reviews, or when resolving KPI disputes.

Steps:

Assign clear ownership:
- KPI Owner: Accountable for definition and business interpretation (executive)
- Data Owner: Accountable for underlying data quality (data team)
- Dashboard Owner: Accountable for reporting infrastructure (BI team)
- Use RACI matrix (see references) to clarify secondary responsibilities
Document ownership in catalog:
- Assign owner to KPI definition with contact information
- Create KPI ownership matrix (spreadsheet or dashboard)
- Link KPI to team or department
- Document escalation path for KPI disputes
Enable accountability:
- Schedule monthly KPI reviews with owners
- Track KPI performance trends
- Document explanations when KPIs miss targets
- Use KPI dashboards to highlight performance issues early
Rotate and transition ownership:
- When owner changes roles, assign replacement
- Document transition in KPI versioning
- Provide new owner with calculation documentation and historical context

Best Practices:

Assign single accountable owner (not a committee)
Ensure owner has authority to make decisions about KPI
Connect KPI ownership to performance management/compensation (creates accountability)
Review ownership quarterly; update when roles change
Document succession plan for critical KPI owners

Workflow: KPI Validation

When to use: Before publishing KPIs, during data quality issues, or when results seem suspicious.

Steps:

Validate calculation logic:
- Walk through calculation step-by-step
- Check for logic errors (incorrect operators, filters, aggregations)
- Verify dimensional alignment (are dimensions aggregated correctly?)
- Test with known scenarios (e.g., "if all customers had 100 orders, KPI should be X")
Validate data quality:
- Use analyze_column_distribution on source columns
- Check for missing values, outliers, or data quality issues
- Validate assumptions (e.g., "all dates are in YYYY-MM-DD format")
- Review data freshness: is data current enough for KPI?
Validate against reality:
- Compare KPI results to manual calculations (if available)
- Run KPI on historical data; check for expected trends
- Benchmark against external data if available (e.g., compare "Market Share" KPI to published reports)
- Interview business owners: "does this number feel right?"
Document validation results:
- Create validation report (see references)
- Document any discrepancies and their root causes
- Establish data quality requirements for KPI use
- Define KPI confidence level (trusted, needs monitoring, experimental)
Set up ongoing monitoring:
- Create KPI quality dashboard (shows data freshness, completeness, outliers)
- Set up alerts for data quality issues
- Schedule monthly validation checks
- Document changes to source data that might affect KPI

Best Practices:

Never publish KPI without validation
Include data quality caveats in KPI definition
Validate with business owners, not just data teams
Document validation assumptions (so others can replicate)
Schedule re-validation when source data changes significantly

Workflow: KPI Lifecycle Management

When to use: When KPIs become irrelevant, during business strategy reviews, or when merging business units.

Steps:

Establish KPI lifecycle states:
- Proposed: New KPI being evaluated
- Active: Currently tracked and reviewed
- Monitored: Less critical but still watched
- Deprecated: Phased out or replaced by newer KPI
- Archived: Historically important, no longer used
Transition KPIs through lifecycle:
- Proposed → Active: After validation and stakeholder approval
- Active → Deprecated: When business objective changes or KPI becomes outdated
- Deprecated → Archived: After 6-12 month sunset period
- Document reason and date for each transition
Manage sunset of deprecated KPIs:
- Communicate sunset date to stakeholders well in advance
- Identify replacement KPI (if applicable)
- Provide training on new KPI
- Archive old dashboards/reports gradually
- Keep historical data accessible for trend analysis
Review and refresh KPI portfolio:
- Conduct annual KPI portfolio review
- Assess each KPI: Still aligned with strategy? Still accurate? Still relevant?
- Identify KPIs for deprecation
- Identify new KPIs needed for emerging priorities

Best Practices:

Document why KPIs were deprecated (important context for future teams)
Don't delete KPI definitions; archive them with historical data
Communicate KPI changes to all stakeholders early
Link deprecated KPI to replacement (if applicable)
Review KPI portfolio annually, not ad hoc

4. Tag Management

Tags are lightweight metadata that enable discovery and governance. A well-designed tag taxonomy makes the catalog navigable at scale.

Workflow: Design a Tag Taxonomy

When to use: At governance program launch or when current tagging scheme becomes unwieldy.

Steps:

Define tag categories:
- Domain Tags: Business domain (Finance, Sales, HR, Operations, Product)
- Sensitivity Tags: PII, Confidential, Internal, Public
- Cadence Tags: Real-time, Daily, Weekly, Monthly, Ad-hoc
- Owner/Team Tags: Owned_by_Finance, Owned_by_Sales, etc.
- Quality Tags: Certified, Under_Review, Experimental, Legacy
- Use Case Tags: KPI, Reporting, Analysis, AI/ML, Regulatory
- See references for detailed taxonomy design patterns
Create controlled vocabulary:
- Define each tag with clear definition
- Document when to use each tag (vs. related tags)
- Establish naming convention (lowercase, no spaces, hyphens for compound terms)
- Example: Use hr-employee-master not HR Employee Master or hr_emp_master
Design hierarchy (if needed):
- Flat hierarchy: Simple tag list, good for small catalogs (<100 assets)
- Hierarchical: Parent/child relationships, good for large catalogs (>500 assets)
- Example hierarchy: domain:finance, domain:finance:accounting, domain:finance:revenue
Publish and train:
- Create tag guide with examples
- Train data owners on tagging conventions
- Publish tag definitions in easily searchable location
- Include tag guide in onboarding documentation

Best Practices:

Start simple; expand tags over time
Limit tag count (20-50 active tags); too many defeats discovery
Use domain category heavily; use other categories sparingly
Don't use tags for information that should be in descriptions
Review tags quarterly; consolidate if duplicates emerge
See references for industry-specific tag examples

Workflow: Apply Tags Consistently Across Assets

When to use: During catalog onboarding, in metadata enrichment sprints, or during quality audits.

Steps:

Define tagging standards:
- Which asset types get tagged? (tables, views, models, measures)
- How many tags per asset? (typically 3-5)
- Which tag categories are mandatory? (e.g., domain, sensitivity)
- What approval is needed? (self-service vs. peer review)
Apply tags systematically:
- Use list_catalog_assets to find untagged or under-tagged assets
- Apply domain tag based on asset purpose
- Apply sensitivity tag based on data content (PII, financial, health, etc.)
- Apply cadence tag based on refresh frequency
- Apply quality tag based on readiness level
Document tagging decisions:
- For each asset, document why specific tags were applied
- Include examples in tag definition (to ensure consistent interpretation)
- Review tagging periodically; adjust tags if meanings change
Enable peer review:
- Have data owner review proposed tags
- Tag changes should be visible in audit trail
- Create dashboard showing tagging coverage (% of assets with required tags)

Best Practices:

Tag all new assets before publishing to catalog
Use tags to enforce governance (e.g., all PII data must have sensitivity:pii tag)
Don't over-tag; too many tags reduce usability
Keep tag count consistent (don't tag one asset with 2 tags and another with 20)
Review tags when asset purpose or data changes

Workflow: Tag-Based Search and Discovery

When to use: When enabling self-service analytics or building faceted search interfaces.

Steps:

Enable tag-based filtering:
- Use search_catalog with tag filters
- Support multi-tag searches (e.g., "show all assets tagged domain:finance AND sensitivity:internal")
- Support tag hierarchies in search (search for domain:finance returns all finance sub-tags)
Create tag-based browsing:
- Create tag clouds or faceted navigation in catalog UI
- Enable "related tags" suggestions (if browsing domain:sales, suggest cadence:daily)
- Show tag frequency (how many assets have each tag?)
Enable self-service discovery:
- Train users to search by tag (simpler than writing complex queries)
- Create tag guides for different personas (executives, analysts, engineers)
- Build dashboards that link tags to data discovery metrics
Track usage patterns:
- Monitor which tags are most searched
- Identify unused tags (candidates for removal)
- Use search analytics to refine tagging strategy

Best Practices:

Make tag search as prominent as free-text search
Enable "did you mean" suggestions for similar tags
Show tag-based recommendations (users viewing asset with tag X also viewed assets with tag Y)
Use tag search to validate tagging strategy (if tags aren't searched, reconsider their value)
Create saved tag searches for common discovery patterns

Workflow: Tag Governance

When to use: When tagging decisions affect multiple teams, when enforcing compliance, or when implementing self-service governance.

Steps:

Define tagging authority:
- Who can create new tags? (centralized: only governance team; decentralized: domain teams with approval)
- Who can apply tags? (any asset owner; or only owners of sensitive assets)
- Who can modify or delete tags? (governance team only)
- Document process in governance policy
Implement approval workflows (if needed):
- For sensitive tags (PII, Financial) or compliance tags, require approval
- Asset owner proposes tags; governance team reviews and approves
- Approval captures who approved and timestamp
- Rejected tags include feedback for why
Monitor tagging compliance:
- Create dashboard showing tagging coverage by domain/team
- Flag assets missing required tags (domain, sensitivity)
- Monthly tagging audit: review high-change assets
- Identify teams with inconsistent tagging practices
Enforce tagging standards:
- Prevent publishing of assets without required tags
- Create alerts for untagged or mis-tagged sensitive assets
- Include tagging checklist in data product publishing workflow
- Tie tagging compliance to team scorecards (if appropriate)

Best Practices:

Keep tagging lightweight; heavy approval workflows reduce adoption
Tag sensitive assets with higher approval rigor
Enable bulk tag application (don't require tagging each asset individually)
Provide clear feedback when tagging violates standards
Review tagging governance quarterly

5. Lineage and Impact Analysis

Understanding data lineage answers critical questions: "Where does this data come from?" "Which models consume this table?" "What breaks if I change this column?"

Workflow: Use Impact and Lineage Analysis to Trace Data Flows

When to use: Before modifying critical tables, during root cause analysis of data issues, or when documenting data flows.

Steps:

Access lineage tools:
- Use get_object_definition to inspect asset structure
- Use list_catalog_assets with lineage context to identify related assets
- Access Datasphere lineage visualization (typically in asset detail page)
- Filter lineage by direction (upstream, downstream, bidirectional)
Trace upstream lineage:
- Start from asset of interest (e.g., a model or measure)
- Follow lineage upstream to source tables
- Identify transformations at each step
- Document assumptions and business logic in transformations
- Example: Model_Sales → View_Sales_Orders → Table_ORDERS (SAP source)
Trace downstream lineage:
- Start from table or view
- Follow lineage downstream to consuming models, measures, KPIs
- Identify all downstream impacts (critical for change assessment)
- Example: Table_CUSTOMERS → View_Customer_Enriched → Model_Customer_Analytics → KPI_Churn_Rate
Analyze bidirectional flows:
- Identify circular dependencies (should be rare)
- Find shared data flows (tables consumed by multiple models)
- Identify choke points (tables with many downstream consumers)
- Document data flow bottlenecks
Document lineage:
- Create lineage diagram showing major data flows
- Document business logic at each transformation
- Include metadata about refresh timing
- Publish documentation with visual lineage

Best Practices:

Lineage should be automatically captured; manually document only complex logic
Include transformation rationale (not just "what" but "why")
Document data quality changes through lineage (where does quality degrade?)
Review lineage when source systems change
Use lineage to identify optimization opportunities (redundant transformations, etc.)

Workflow: Generate Impact Reports

When to use: Before making changes to critical assets, during root cause analysis, or during governance audits.

Steps:

Define change scope:
- Identify specific table, column, model, or measure being changed
- Document nature of change (delete column, rename, change calculation, deprecate table)
- Estimate impact scope (how many downstream assets affected?)
- Assess impact severity (internal tools only vs. customer-facing dashboards)
Generate impact analysis:
- Use lineage tools to identify all downstream consumers
- Classify consumers by impact type:
  - Direct: Assets directly consuming the changed object
  - Indirect: Assets consuming direct consumers
  - KPI: KPIs affected by change
  - Reports/Dashboards: BI artifacts consuming changed assets
- Identify affected stakeholders (which teams will feel impact?)
Assess impact severity:
- For each impacted asset, assess:
  - Criticality: Is this a critical KPI? Customer-facing? Regulatory?
  - Detectability: Would broken data be noticed immediately or silently wrong?
  - Blast Radius: How many end users affected?
- Document each assessment with rationale
Create impact report:
- Change Summary: What is being changed and why?
- Downstream Impacts: List all affected assets with severity
- Mitigation Plans: How to minimize impact? (e.g., phased rollout, temporary shadow calculation)
- Testing Plan: How to validate change doesn't break downstream?
- Rollback Plan: How to revert if issue discovered?
- Stakeholder Notifications: Which teams need to be informed?
Use report to gain approvals:
- Share impact report with affected stakeholders
- Get sign-off from asset owners before proceeding
- Document approval (who approved, when, any conditions)
- Update report as approvals gathered

Best Practices:

Always generate impact report before modifying critical assets
Use impact reports to surface hidden downstream dependencies
Include indirect impacts (sometimes more important than direct)
Define severity thresholds (when is impact too high?)
Use impact analysis to identify opportunities to consolidate redundant implementations

Workflow: Upstream Analysis

When to use: When investigating data quality issues, understanding data freshness, or documenting data sources.

Steps:

Trace back to original sources:
- Start from asset with issue (model, measure, or dashboard)
- Follow lineage upstream to source tables
- Identify each transformation step
- Document data quality at each stage
Identify source systems:
- For each upstream table, identify source system (SAP, Salesforce, custom app, etc.)
- Document extraction frequency (real-time, batch, delayed)
- Identify data quality issues in source (missing values, duplicates, delays)
Analyze data quality degradation:
- Identify where data quality issues are introduced
- Example: "Missing Customer Names in source → not populated in enriched view → shows as blanks in dashboard"
- Document which transformations impact quality
Identify opportunities to improve:
- Use upstream analysis to fix quality issues at source (better than downstream workarounds)
- Identify redundant transformations that could be consolidated
- Propose moving transformations closer to source (for efficiency)

Best Practices:

Document source system characteristics (reliability, update frequency)
Use upstream analysis to identify data quality root causes
Periodically review upstream dependencies; document changes
Build dashboards tracking data freshness across lineage
Work with source system owners to improve data quality upstream

Workflow: Downstream Analysis

When to use: When deprecating assets, understanding asset usage, or during data quality investigations.

Steps:

Identify all downstream consumers:
- Start from table, view, or measure
- Trace forward to consuming models, measures, KPIs
- Continue tracing to dashboards, reports, or AI/ML models
- Use list_catalog_assets to identify all references
Categorize downstream usage:
- Critical: KPIs, customer-facing dashboards, regulatory reports
- Important: Internal dashboards, analyst-used models, operational reports
- Experimental: Prototype dashboards, one-time analyses
- Assess what happens if asset becomes unavailable
Understand consumption patterns:
- Which downstream assets are actively used?
- Which downstream assets are stale/unused? (candidates for cleanup)
- Which measures appear in multiple KPIs? (indicates high leverage)
- Which dashboards have most users? (highest-risk to break)
Plan changes safely:
- For deprecation: Provide replacement assets before turning off original
- For modifications: Test changes against actual downstream consumers
- Communicate changes early to downstream owners
- Build deprecation timeline that allows downstream adjustment

Best Practices:

Use downstream analysis to understand asset criticality
Identify "hidden" downstream consumers (often forgotten dependencies)
Use downstream usage patterns to prioritize data governance efforts
Periodically clean up unused downstream assets (reduces technical debt)
Track downstream usage to validate data product adoption

Workflow: Change Impact Assessment Before Modifications

When to use: Before modifying any production asset, during data quality fixes, or during optimization projects.

Steps:

Plan change:
- Document what is being changed (table structure, calculation logic, assumptions)
- Document why change is needed (bug fix, performance, business requirement)
- Document expected benefits and risks
Analyze downstream impacts:
- Use downstream analysis workflow (above) to identify all consumers
- Generate impact report (see above workflow)
- Identify critical vs. non-critical impacts
- Document stakeholders who need notification
Design safe change approach:
- Option 1 (No-Impact Approach): Add new column/table; don't modify existing ones
- Option 2 (Backward-Compatible): Support old and new logic simultaneously during transition
- Option 3 (Phased Rollout): Change in phases; monitor for issues at each phase
- Option 4 (Temporary Shadow): Run new logic alongside old; validate before switching
- Choose approach based on risk and downstream impact
Build testing plan:
- Test change in development/test environment first
- Test with actual data volumes and realistic downstream consumption patterns
- Validate that downstream assets still produce correct results
- Document test cases and results
Execute with monitoring:
- Apply change (in phases if using phased rollout)
- Monitor downstream dashboards/reports for unexpected changes
- Monitor data quality metrics (completeness, accuracy, freshness)
- Have rollback plan ready
- Document any issues discovered during rollout
Communicate and document:
- Notify downstream stakeholders of change and any impacts
- Document change in asset versioning/changelog
- Update lineage documentation if data flows changed
- Conduct post-mortem if issues discovered

Best Practices:

Always analyze impact before change; never assume "no one will notice"
Use phased or shadow approaches for high-risk changes
Build automated tests to validate downstream assets after changes
Keep rollback capability for at least 24-48 hours after change
Review impact assessment process regularly; improve based on incidents

6. Data Quality Scoring and Tracking

Data quality scores help users select trustworthy datasets and identify improvement opportunities.

Workflow: Define Quality Dimensions and Scoring

When to use: During governance program launch or when implementing data quality initiatives.

Steps:

Identify quality dimensions:
- Completeness: Are all required values present? (inverse of missing/null rates)
- Accuracy: Do values match source of truth or business rules?
- Timeliness: Is data fresh? How long since last update?
- Consistency: Do related values align? (e.g., sum of parts = total)
- Uniqueness: Are there unintended duplicates?
- Validity: Do values match expected format/range?
- See references for detailed scoring templates
Define scoring methodology:
- For each dimension, establish measurement logic
- Example Completeness: "Score = 100 * (non-null rows / total rows)"
- Example Timeliness: "Score = 100 if updated in last 24 hours; decreases 5 points per day stale"
- Document assumptions and edge cases
- Establish minimum thresholds (e.g., Completeness must be ≥95%)
Aggregate scores:
- Calculate overall quality score from dimension scores
- Use weighted average if some dimensions more important (see references for templates)
- Example: Overall = (Completeness * 30% + Accuracy * 30% + Timeliness * 25% + Consistency * 15%)
- Establish overall quality tiers: Certified (90+), Trusted (80-89), Monitor (70-79), Issue (<70)
Establish scorecard template:
- Create quality scorecard showing all dimensions
- Include trends over time (are we improving or degrading?)
- Document current blockers to achieving higher scores
- Set improvement targets

Best Practices:

Start with simple dimensions (completeness, timeliness); add advanced dimensions over time
Base scores on automated measurements when possible (avoid manual scoring)
Review scoring methodology quarterly; update if dimensions/thresholds change
Communicate quality scores to all data consumers
Link quality issues to root causes (helps with remediation)

Workflow: Score Assets on Quality Metrics

When to use: During metadata enrichment, when onboarding new data sources, or in quality audits.

Steps:

Measure quality dimensions:
- Use analyze_column_distribution to assess completeness (what % of rows have values?)
- Check timeliness (when was data last refreshed?)
- Run validation rules to assess accuracy (do values meet business rules?)
- Compare with source to assess consistency
Calculate quality scores:
- For each dimension, calculate score using methodology from above
- Aggregate dimension scores into overall quality tier
- Document any assumptions or manual overrides
- Identify root causes of low scores
Assign quality tags:
- Tag assets with quality tier (Certified, Trusted, Monitor, Issue)
- Tag assets with specific quality issues (duplicate_data, stale_data, missing_values, etc.)
- Use tags to surface quality issues in catalog search
Publish quality metadata:
- Add quality score and dimensions to asset detail page
- Create quality dashboard showing scores across portfolio
- Enable sorting/filtering by quality score
- Show trend charts (is quality improving over time?)

Best Practices:

Automate quality scoring; don't rely on manual assessments
Review quality scores weekly or monthly (not annually)
Link quality issues to improvement projects (make it actionable)
Highlight quick wins (easy-to-fix quality issues)
Recognize teams that improve data quality

Workflow: Quality Dashboards and Trending

When to use: When establishing quality culture, during quality improvement programs, or for executive visibility.

Steps:

Build quality scorecards:
- Create dashboard showing quality scores across all assets
- Show breakdown by domain (Finance, Sales, HR, etc.)
- Show breakdown by source system or team
- Display as heatmap or scorecard format
Visualize trends over time:
- Track quality score trends (improving or degrading?)
- Identify tables with declining quality (investigate why)
- Celebrate tables with improving quality (recognize teams)
- Use trends to justify investment in quality initiatives
Enable drill-down analysis:
- Click table to see detailed quality metrics (dimension scores)
- See which columns are causing low quality
- View quality issues identified in validation rules
- Link to remediation projects or tickets
Use for accountability:
- Assign quality scorecards to team owners
- Monthly review of team's quality scorecard
- Tie quality improvements to performance goals
- Use quality metrics in hiring/promotion decisions (if appropriate)

Best Practices:

Make quality visible to all users (public dashboard, not hidden)
Set realistic improvement targets (don't expect 100% overnight)
Celebrate improvements; don't just criticize poor quality
Link quality metrics to business impact (show cost/risk of poor quality)
Review quality dashboards at least monthly

7. Catalog Review Workflows

Regular catalog reviews prevent stale data, ensure accurate metadata, and maintain governance standards.

Workflow: Periodic Asset Review Scheduling

When to use: To establish ongoing governance cadence.

Steps:

Define review schedule:
- Critical Assets: Monthly review (KPIs, customer-facing models, regulatory data)
- Important Assets: Quarterly review (heavily-used dashboards, core measures)
- Standard Assets: Annual review (everything else)
- Establish review calendar with assigned owners
Create review checklist:
- Does the asset meet current business needs?
- Is metadata up-to-date (name, description, tags)?
- Is quality acceptable? Any known issues?
- Is asset actively used? By whom?
- Is ownership clear and current?
- Are there any deprecated or redundant assets to clean up?
Conduct reviews:
- Send review request to asset owner with checklist
- Owner reviews and confirms or updates metadata
- Owner identifies any issues or improvement opportunities
- Governance team follows up on unfinished reviews
Track and report:
- Monitor review completion rates
- Report on findings (common issues, needed improvements)
- Create action items for improvements identified
- Schedule follow-up reviews for problematic assets

Best Practices:

Keep review lightweight; 5-minute checklist is better than hour-long review
Automate notifications and tracking (don't rely on email)
Make reviews part of team's regular cadence (e.g., first Friday of month)
Recognize teams with high-quality catalogs
Use review findings to improve governance processes

Workflow: Stale Asset Identification and Cleanup

When to use: During governance audits or when trying to reduce catalog clutter.

Steps:

Identify stale assets:
- Use list_catalog_assets to find assets with:
  - No recent updates (e.g., not modified in 1+ year)
  - No recent usage (e.g., not consumed by any dashboards/models)
  - No owner or owner no longer in organization
- Flag assets as potentially stale
- Create stale asset inventory
Investigate stale assets:
- For each stale asset, determine why it's not used
- Is it truly unused? Or is usage not tracked?
- Use downstream analysis to check for hidden dependencies
- Interview potential users (is this asset still needed?)
Plan consolidation or deprecation:
- Option 1: Consolidate with similar active asset (reduce duplication)
- Option 2: Deprecate with migration path to replacement asset
- Option 3: Archive with clear documentation (in case needed in future)
- Option 4: Delete if truly redundant and no users identified
- Get stakeholder approval before action
Execute cleanup:
- For deprecation: Communicate timeline, provide replacement access
- For consolidation: Migrate any remaining users to replacement asset
- For archival: Move to archive location, keep documentation accessible
- For deletion: Only after confirming no users/lineage dependencies
Monitor and report:
- Track cleanup progress and completed actions
- Report on amount of technical debt removed
- Celebrate catalog cleanliness improvements
- Establish policies to prevent stale assets from accumulating again

Best Practices:

Don't delete without confirming no users (hidden dependencies surprise you)
Keep archived assets documented; don't just disappear them
Communicate deprecations early; give long lead time (6+ months)
Use cleanup projects to establish ongoing maintenance culture
Review and approve stale asset cleanup at governance committee level

Workflow: Ownership Assignment and Accountability

When to use: During onboarding, when ownership gaps identified, or during governance reviews.

Steps:

Define ownership model:
- Technical Owner: Responsible for data model, refresh, quality
- Business Owner: Responsible for business interpretation, accuracy
- Executive Sponsor: Accountable for strategic alignment
- Document roles and responsibilities in RACI matrix (see references)
Identify ownership gaps:
- Use list_catalog_assets to find assets without assigned owner
- Create ownership inventory (asset → assigned owner)
- Identify teams with too much ownership (capacity issues)
- Identify gaps where ownership unclear
Assign ownership:
- Match assets to appropriate owners based on:
  - Team responsible for data model
  - Team most familiar with business context
  - Team consuming data most heavily
- Get owner approval before assigning
- Document escalation path if owner unavailable
Enable accountability:
- Use ownership assignments to route review requests
- Track owner response rates and quality of reviews
- Recognize owners with high-quality asset governance
- Provide support/training to struggling owners
Maintain ownership:
- Review ownership quarterly
- Update when owners change roles
- Establish succession planning for critical asset owners
- Document owner transitions with knowledge transfer

Best Practices:

Assign single owner (not committees); clarifies accountability
Ensure owner has time/authority to manage asset (avoid overloading)
Rotate ownership periodically (prevents siloing of knowledge)
Provide owners with tools and dashboards to manage their assets
Tie ownership to performance reviews/compensation (creates accountability)

MCP Tools Reference

This skill leverages these Datasphere MCP tools:

search_catalog - Search catalog by name, description, or metadata; filter by type, domain, tag
get_asset_details - Retrieve full metadata for table, view, model, or measure (structure, lineage, ownership)
list_catalog_assets - List assets matching criteria (type, owner, status, tag); supports pagination
search_repository - Search source system definitions and imported objects
get_object_definition - Retrieve detailed definition of object (structure, calculations, lineage)
get_deployed_objects - List deployed models/measures and their status
analyze_column_distribution - Analyze column data types, cardinality, completeness, distributions

Best Practices Summary

Metadata Enrichment:

Use business terminology, not technical jargon
Keep descriptions under 500 words; link to detailed docs
Auto-suggest tags; require human review before applying
Batch updates by logical group; test on samples first

Glossary Management:

Start with 20-30 high-impact terms; grow over time
Involve business owners in definitions
Link glossary terms to actual technical implementations
Version glossary; communicate changes to stakeholders
Resolve terminology conflicts through formal approval process

KPI Definition:

Define calculation logic clearly; include examples
Validate KPI against underlying data before publishing
Assign single owner; document accountability
Review KPI portfolio annually for relevance
Version KPI definitions; don't silently change calculations

Tag Management:

Design controlled vocabulary; limit active tags to 20-50
Apply tags consistently across all assets
Use tags for governance enforcement (required tags for sensitive data)
Review and consolidate tags quarterly

Lineage & Impact:

Always analyze downstream impact before modifying critical assets
Use impact reports to gain stakeholder approval for changes
Identify and consolidate redundant implementations
Build automated tests to validate changes against downstream consumers

Data Quality:

Automate quality scoring; avoid manual assessments
Make quality scores visible to all users
Link quality issues to remediation projects
Review quality metrics monthly; celebrate improvements

Catalog Reviews:

Keep reviews lightweight (5-minute checklist)
Conduct critical asset reviews monthly; standard assets annually
Identify and clean up stale assets regularly
Maintain ownership assignments; rotate periodically

Common Anti-Patterns and Solutions

Anti-Pattern: Metadata written for IT, not business

Solution: Use business analyst as template reviewer; remove jargon

Anti-Pattern: Too many tags; users can't navigate

Solution: Consolidate to 20-30 core tags; deprecate duplicates

Anti-Pattern: KPI definitions silently change (breaks downstream calculations)

Solution: Version KPI definitions; communicate changes; validate impact

Anti-Pattern: Assets with no owner; governance unenforceable

Solution: Systematically assign owners; include ownership in publishing workflow

Anti-Pattern: Quality issues discovered downstream; no visibility upstream

Solution: Build quality dashboards; surface issues early; tie to remediation projects

Anti-Pattern: Lineage not captured; impact analysis impossible

Solution: Ensure lineage automatically captured from data models; manually document complex logic

Anti-Pattern: Glossary becomes unmaintainable; conflicting definitions

Solution: Implement formal approval workflow; version terms; resolve conflicts through governance

Integration with Data Product Publishing

The Catalog Steward skill complements the Data Product Publisher skill:

Catalog Steward: Organizes internal repository (metadata, quality, lineage, governance)
Data Product Publisher: Publishes curated products to external marketplace

Before publishing a data product, use Catalog Steward to:

Ensure all source assets have clear ownership and quality certification
Validate glossary terms and KPI definitions
Verify lineage and impact analysis (understand ripple effects)
Establish quality SLAs for published product
Assign business/technical owners responsible for product quality

Getting Started

Audit Current State:
- Use list_catalog_assets to inventory all assets
- Use get_asset_details to assess metadata quality
- Identify highest-value governance improvements
Design Your Governance Model:
- Define roles and responsibilities (RACI)
- Design tag taxonomy and glossary structure
- Establish quality scoring methodology
Execute Pilot Project:
- Pick 1-2 business domains for pilot
- Enrich metadata, add glossary terms, implement tagging
- Build sample quality dashboard
- Get stakeholder feedback and refine approach
Scale Governance Program:
- Extend to additional domains
- Build automation for metadata enrichment, quality scoring, lineage capture
- Establish review cadences and ownership assignments
- Train data owners and catalog curators
Measure and Optimize:
- Track catalog usage metrics (searches, views, discovery patterns)
- Monitor governance compliance (tagging, quality, ownership)
- Conduct quarterly reviews; adjust policies based on feedback
- Celebrate wins; recognize teams driving adoption