From design-system-ops
Assesses design system health across seven dimensions—tokens, components, documentation, adoption, governance, AI readiness, platform maturity—producing findings summary and prioritized actions.
npx claudepluginhub murphytrueman/design-system-opsThis skill uses the workspace's default tool permissions.
A skill for producing a holistic design system health assessment across seven dimensions: tokens, components, documentation, adoption, governance, AI readiness, and platform maturity. Output is a findings-based executive summary with a prioritised action list.
Audits design system component libraries for usage signals, complexity, duplication, and coverage gaps, producing tiered findings, inventory, and prioritized recommendations.
Assesses design system maturity on 1-4 scale with metrics, evaluates adoption health and governance, analyzes component duplication, calculates custom vs system ratios, identifies barriers and debt trends.
Detects design systems in code, identifies token drifts with paired evidence blocks showing definitions vs conflicting usages. Use for auditing UI consistency.
Share bugs, ideas, or general feedback.
A skill for producing a holistic design system health assessment across seven dimensions: tokens, components, documentation, adoption, governance, AI readiness, and platform maturity. Output is a findings-based executive summary with a prioritised action list.
Token audits find naming violations. Component audits find unused variants. Adoption analysis finds coverage gaps. But none of these in isolation tells you whether the system is healthy. Health is a function of how well all five dimensions are working together — a system with excellent tokens and terrible governance is still a fragile system, and a well-adopted system with thin documentation is one team member departure away from collapse.
This assessment is designed to give a snapshot of the whole system, not a deep dive into any single area. It is useful as a starting point for prioritisation, as a quarterly review artefact, or as background for a stakeholder conversation about where investment is needed.
For deeper work on any individual dimension, route to the relevant specialist skill (token-audit, component-audit, etc.).
Before producing output, check for a .ds-ops-config.yml file in the project root. If present, load:
system.* — pre-populates system size, framework, and theming contextseverity.* — calibrates finding severity thresholdsintegrations.* — enables auto-pull across all five dimensions (see below)recurring.* — enables trend tracking across periodic health assessmentsSystem health benefits from every configured integration — it is the broadest assessment. Pull automatically where available:
Figma MCP (integrations.figma.enabled: true):
npm registry (integrations.npm.enabled: true):
GitHub (integrations.github.enabled: true):
Storybook / Documentation (integrations.storybook.enabled or integrations.documentation.enabled):
These signals are supplementary — the dimension assessment still requires the structured checks in Step 2. But auto-pulled data replaces estimated figures with measured ones, which makes the findings more defensible.
If recurring is configured in .ds-ops-config.yml:
recurring.output_directory.recurring.retain_count.Before assessing health, determine what kind of shared UI this actually is. Not everything is a design system, and the distinction changes what advice is useful.
Classify the library type from the codebase signals:
How to detect: Look at the codebase structure, not what the README calls it. A repo named "design-system" with 4 components, no tokens, and no docs is a component library in early stages, not a design system with gaps. Signals: presence of token files (JSON, SCSS variables, CSS custom properties), number of components, presence of a documentation platform config (Fractal, Storybook, Zeroheight), package.json publishing config, and whether there are versioned releases.
Include the classification in the report header as "Library type: [Design system / Component library / Pattern library / Utility collection]" and calibrate all recommendations to match. A pattern library does not need a "contribution workflow" — it needs its patterns to be findable and accurate.
A system health assessment can be completed at different levels of depth depending on what information is available. Ask for or confirm (skip questions already answered by auto-pull):
If access to the system itself is limited, the assessment can be conducted through a structured interview. Note in the output that the assessment is based on reported information rather than direct inspection, and flag which findings would benefit from direct verification.
Small-system note (fewer than 5 components): The five-dimension assessment still works at this size, but adjust expectations for the Components dimension. Complexity distribution (foundational vs. compound vs. feature) is not meaningful with 1–4 components — skip it. Focus the Components dimension on API clarity, state coverage, and documentation completeness per component instead. For the Adoption dimension, redefine "active adoption" in terms of what percentage of the team's actual needs the system serves, not raw component count. A system with 3 components that covers 80% of a team's interface needs is healthier than a system with 30 components that covers 20%.
Before assessing any dimension, establish the system's current maturity stage. This calibrates expectations — an ad-hoc system should not be judged for lacking platform-grade capabilities.
Ask: "Where would you place this system today?"
Use the maturity stage when writing the summary — frame findings as "appropriate for this stage" or "below expectations for this stage" rather than against an absolute scale. A managed system with no AI readiness is expected. A measured system with no AI readiness is a significant gap.
For each dimension, assess the current state and assign a status:
Assess:
Key questions to ask or check:
Assess:
Key questions:
Assess:
Key questions:
Assess:
Key questions:
Note: Adoption is the dimension most commonly assessed by coverage (whether the system is available to teams) rather than actual use. Distinguish clearly between the two.
Label this dimension based on the library type identified in Step 0:
Assess:
Calibrate for team size: For teams under 5 people, a documented governance framework is overhead, not maturity. What matters is whether someone is responsible and whether decisions are reversible. Frame recommendations accordingly — "make sure someone owns this" rather than "establish a governance model."
Key questions:
Assess:
Key questions:
Assess:
Key questions:
Open with a headline sentence that tells the reader how worried they should be and where to focus — before any tables or structure. Example: "Your system is strong on components and tokens, but governance and documentation are the bottleneck. Here's the dimension-by-dimension picture."
Date: [date] Assessment type: Direct inspection / Reported information / Mixed Assessed by: [if applicable]
| Dimension | Status | Key finding |
|---|---|---|
| Tokens | 🟢 / 🟡 / 🟠 / 🔴 | [One-sentence summary] |
| Components | ||
| Documentation | ||
| Adoption | ||
| Governance / Decision-making / Ownership | ||
| AI readiness | ||
| Platform maturity |
Use the dimension label that matches the library type from Step 0: "Governance" for design systems, "Decision-making" for component/pattern libraries, "Ownership" for utility collections.
Status key: 🟢 Strong · 🟡 Functional · 🟠 Weak · 🔴 Absent
Maturity stage: [Ad-hoc / Managed / Systematic / Measured / Optimised] — [one sentence of evidence]. To reach the next stage, the system needs [specific action].
Two to three sentences. What is the honest state of this system? What is the most important thing to pay attention to? Write this like you're briefing a peer, not filing a report.
For each dimension: the status emoji, two to four specific findings with evidence, and a one-sentence summary of the most important action. Skip dimensions with no findings — a single line in the summary table ("🟢 Strong — no issues found") is enough.
Group recommendations into three tiers:
Immediate (next 4 weeks) Actions that, left unaddressed, will make other improvements harder or will actively erode system trust.
Near-term (next quarter) Structural improvements that require planning but are clearly worth doing.
Longer-term (6+ months) Investments that will pay dividends once the immediate and near-term work is done.
End the report with:
A note on context: This assessment sees your system's artefacts — it does not see the history, constraints, or trade-offs behind them. Some findings may flag gaps your team has already considered and accepted. If any finding describes an intentional decision or a known limitation, let me know — I'll calibrate future assessments to your system's actual priorities rather than a generic ideal. The goal is to surface blind spots, not to question choices you've already made deliberately.