From design-system-ops
Identifies system-wide drift in design systems: components diverging from specs, local token overrides, forked patterns across teams. Classifies severity, origins, and generates response recommendations.
npx claudepluginhub murphytrueman/design-system-opsThis skill uses the workspace's default tool permissions.
A skill for identifying and classifying drift in a design system — the accumulated distance between design system intent and actual implementation across consuming products. Produces a drift report with severity ratings, origin classification, and recommended response for each finding.
Detects design systems in code, identifies token drifts with paired evidence blocks showing definitions vs conflicting usages. Use for auditing UI consistency.
Detects design drift and inconsistencies from design systems by scanning codebases for deviations in colors, typography, spacing, layout, hardcoded values, and accessibility issues. Generates reports and remediation plans.
Checks alignment between a design specification and code implementation for a single component or screen, producing a structured discrepancy report by dimension and severity.
Share bugs, ideas, or general feedback.
A skill for identifying and classifying drift in a design system — the accumulated distance between design system intent and actual implementation across consuming products. Produces a drift report with severity ratings, origin classification, and recommended response for each finding.
Drift is the normal condition of a used design system. The question is not whether your system has drifted — it has — but whether the drift is intentional, how severe it is, and whether it is compounding.
Not all drift is bad. A product team that made a deliberate, documented exception to an established pattern is making a design decision. A product team that unknowingly re-implemented a design system component with slightly different spacing is creating maintenance debt. The distinction matters, because the response is different: intentional drift might become a contribution, while accidental drift needs to be corrected and its root cause addressed.
This skill distinguishes between drift types and routes each finding to the appropriate response.
Before producing output, check for a .ds-ops-config.yml file in the project root. If present, load:
severity.* — overrides for drift finding severity ratingssystem.styling — pre-selects token drift detection approach (CSS vars, SCSS, Tailwind, CSS-in-JS)integrations.* — enables auto-pull for drift comparison data (see below)recurring.* — enables comparison with previous drift reportIf integrations are configured in .ds-ops-config.yml, pull data automatically:
Figma MCP (integrations.figma.enabled: true):
integrations.figma.file_keyGitHub (integrations.github.enabled: true):
integrations.github.repo for code patterns that indicate drift:
!important on design token propertiesChromatic (integrations.chromatic.enabled: true):
If an integration fails, log it and proceed with manual input.
If recurring is configured in .ds-ops-config.yml:
recurring.output_directory.recurring.retain_count.Ask for or confirm (skip questions already answered by auto-pull):
The more specific the scope, the more actionable the report. A drift detection across "the whole system" surfaces patterns but produces a long list of findings with limited prioritisation signal. Scoping to a specific product or a specific component category produces a more actionable output.
Small-system note (fewer than 5 components): For systems this size, scope to the full system — there is no need to sample. Drift patterns are different in small systems: teams are typically smaller and more aligned, so drift is less likely to be accidental and more likely to be intentional divergence (Classification A) or a system gap (Classification E). Simplify the output to a per-component checklist rather than a full drift report. If all components show no drift, state that as the finding and recommend a review cadence.
Drift is always relative to something. Confirm the source of truth being used as the reference:
If there is no clear single source of truth, that is a finding in itself and should be included at the top of the report.
Assess across four dimensions:
Differences in the visual treatment of a component or pattern compared to the design system reference. Includes spacing, colour (particularly non-token values), typography, border radius, shadow, and icon usage.
For each instance: name the component or pattern, describe the visual difference, and note whether it appears to be intentional or accidental.
Differences in interactive behaviour — state transitions, animation, timing, keyboard behaviour, focus management — compared to the designed and documented component behaviour.
Behavioural drift is often the hardest to detect without direct testing, but it carries the highest risk because it includes accessibility regressions.
Components implemented with different props, different prop names, or different prop semantics than the design system's published API. This is most common when teams implement a component locally rather than consuming it from the system, or when a local version was built before the system component existed and was never migrated.
Raw values used where design tokens should be referenced. Tokens referenced at the wrong tier. Local token overrides that conflict with semantic intent. Token names used inconsistently across implementations.
Detection depends on the styling approach: in CSS custom properties, look for raw values outside var(). In SCSS, look for raw literals not using $ variables. In Tailwind, look for arbitrary value brackets (h-[12px], bg-[#ff0000]) — standard utility classes that resolve to configured tokens are not drift. In CSS-in-JS, look for raw values outside theme object references. When SCSS variables are the token system, tier can be inferred from naming patterns (e.g. $color-blue-500 → primitive, $color-action-primary → semantic) even without a full SCSS parser.
Classify every finding before assigning a response:
Classification A: Intentional divergence The product team made a deliberate decision to diverge from the system, for a known reason. This may be appropriate (the system does not serve this context) or a contribution candidate (the need is real and should be in the system).
Classification B: Version lag The implementation matches an older version of the design system. The system has moved on; the product has not. This is not a mistake — it is normal entropy — but it accumulates into a migration burden if left unaddressed.
Classification C: Accidental drift The implementation diverged from the system without intent. Most commonly caused by implementing a component locally when the system version was not yet available, then not migrating once it was.
Classification D: Misunderstanding The implementation reflects a misreading of the documentation or specification. The consumer thought they were using the system correctly and did not know they were not.
Classification E: System gap The drift exists because the system did not have what the product team needed. The divergent implementation is the product team's solution to a design system gap, not a mistake.
Not all drift carries equal risk. Weight severity by component criticality:
Critical path components (core navigation, authentication, checkout, primary data entry) — drift here is automatically elevated one severity level. A Medium finding on a checkout component becomes High.
High-traffic components (buttons, form inputs, cards, modals) — drift here affects the most users. Weight stays as assessed but flag the blast radius.
Utility components (layout wrappers, spacing helpers, icon containers) — drift here is lower risk. A Medium finding may be downgraded to Low if the component is not user-facing.
After classifying each drift instance, route it to the appropriate response:
| Classification | Primary response | Skill to run next |
|---|---|---|
| A — Intentional divergence | Document as a decision record | decision-record |
| B — Version lag | Offer migration path with effort estimate | deprecation-process (for migration guidance) |
| C — Accidental drift | Fix the implementation + review docs that failed to prevent it | design-to-code-check |
| D — Misunderstanding | Update documentation + notify affected teams | change-communication |
| E — System gap | Route to contribution workflow | contribution-workflow |
For drift classified as B (version lag), model the migration impact:
Per-instance migration cost:
Aggregate migration debt:
Migration path clarity:
This modelling converts drift from "a list of problems" to "a prioritised migration plan with estimated effort." Include it as a section in the drift report.
For organisations with multiple design systems (brand-specific, platform-specific, sub-systems), assess drift between systems:
color.action.primary resolving to blue in one system and green in another is a cross-system drift that confuses teams working across products.Button in the marketing system and a Button in the product system should either share an API or have different names.Cross-system drift is typically invisible until a team works across system boundaries. Surface it proactively.
Open with a headline sentence. Example: "Drift is moderate and mostly accidental — 3 components have diverged from the system, and 2 of those look like the team didn't know the system version existed. Here's the full picture."
Date: [date] Scope: [what was assessed] Reference source: [what the assessment was compared against] Assessment method: [direct inspection / reported / mixed]
What is the overall drift picture? Is the system drifting in a controlled way or compounding? What is the most important finding? Write this like you're debriefing a colleague, not filing a compliance report.
For each finding:
| ID | Location | Dimension | Classification | Severity | Description | Recommended action |
|---|---|---|---|---|---|---|
| DF-01 | [product/team/component] | [visual/behavioural/API/token] | [A–E] | 🔴/🟠/🟡/⚪ | [specific description] | [specific action] |
Severity key: 🔴 Critical · 🟠 High · 🟡 Medium · ⚪ Low
Intentional divergence (A) List findings. For each, note whether it is a contribution candidate or an accepted exception. If it is an accepted exception, it should be documented as a decision record.
Version lag (B) List findings. Group by component or token set. Identify whether a migration sprint would address the bulk of these, or whether they are spread too thin for a coordinated migration.
Accidental drift (C) List findings. These are the highest priority for correction because they are unintentional — the product team would want to know.
Misunderstanding (D) List findings. These indicate a documentation or communication gap. The finding should be corrected, and the documentation or onboarding that failed to prevent it should be reviewed.
System gaps (E) List findings. These are contribution candidates. If the same gap appears across multiple products, the case for adding it to the system is stronger. Cross-reference the contribution workflow.
Step back from individual findings and identify patterns:
Root cause patterns are more actionable than individual findings. Addressing a root cause prevents future drift; addressing individual findings only corrects the current state.
Prioritised list:
If the Figma Console MCP from Southleft is connected (check for figma_capture_screenshot and figma_get_component_for_development tool availability), enhance the drift report with visual evidence.
Capture design reference: Use figma_capture_screenshot to capture the current state of each drifted component as it appears in Figma. This uses the plugin's exportAsync API, which captures the live state — not a cached cloud render. The result is the design-side truth for visual comparison.
Capture implementation reference: Use figma_get_component_for_development to get dev-optimised component data alongside a rendered image. This gives you the design spec in a format that maps directly to implementation properties.
Visual diff in report: For each drift finding classified as "implementation divergence" or "version lag", include both the Figma screenshot and a description of what the code implementation looks like. This gives the reader a visual understanding of the gap, not just a textual description of property mismatches.
When the standard Figma MCP is connected (read-only): Screenshots via REST API are available but reflect the last-published cloud state, not the current live state. Note this limitation — if the Figma file has unpublished changes, the screenshot may not reflect the latest design intent.
End the report with:
A note on context: This analysis identifies where implementations differ from the system — it cannot always tell why. The classification system (intentional, accidental, version lag, misunderstanding, system gap) is a starting point. If any instance is misclassified, let me know — your corrections make future drift checks more accurate. The goal is to distinguish the drift that needs fixing from the drift that needs documenting.