npx claudepluginhub haabe/mycelium --plugin myceliumThis skill uses the workspace's default tool permissions.
Operational gate for the Explainability Theory Gate (Gate 13). Audits whether a product's AI components meet a defensible XAI bar — disclosure, decision-explanation, recourse, fidelity, system card — scaled to the AI Act risk tier.
Guides AI governance and compliance including EU AI Act risk classification, NIST AI RMF assessments, responsible AI principles, ethics reviews, and regulatory requirements for AI systems.
Guides AI governance planning for ML systems, including EU AI Act risk classification, NIST AI RMF implementation, ethics frameworks, and compliance documentation.
Assesses EU AI Act applicability, risk classification (Annex III), and Article 50 transparency for products with AI/ML components. Documents in threat-model.yml and decision-log.md.
Share bugs, ideas, or general feedback.
Operational gate for the Explainability Theory Gate (Gate 13). Audits whether a product's AI components meet a defensible XAI bar — disclosure, decision-explanation, recourse, fidelity, system card — scaled to the AI Act risk tier.
This skill is functionally-grounded in Doshi-Velez & Kim's (2017) sense: it operates on artifacts and configuration, not on real users with real tasks. Output explicitly distinguishes validated_functionally from needs_user_testing. The honest tag is what passes Gate 13; user-grounded validation is recommended but not blocking.
active-stack.yml :: ai_components.detected: true/mycelium:diamond-progressRead ${CLAUDE_PLUGIN_ROOT}/jit-tooling/active-stack.yml (Step 1c output of delivery-bootstrap per ${CLAUDE_PLUGIN_ROOT}/jit-tooling/detector.md).
ai_components.detected is missing or false: report "No AI components detected — XAI Gate N/A. Run /mycelium:delivery-bootstrap if you believe AI is present but undetected." Stop.ai_components.detected: true but user_facing_decisions: unknown (Step 6 confirmation never answered): prompt the user explicitly: "This product has AI components, but it's not on record whether their outputs reach end users in a user-affecting way. Does the AI's output deny / recommend / rank / generate content shown to users, or otherwise drive their experience?" Do not proceed silently — XAI tier depends on this answer. If the user defers, default to tier: limited and note "tier defaulted to limited pending user_facing_decisions confirmation" in the output.For each service in services.yml (loop — multiple services produce per-service findings):
Source canonical tier from /mycelium:regulatory-review output if available. Read .claude/canvas/privacy-assessment.yml for prior AI Act risk classification. If /mycelium:regulatory-review has run, use its tier; this skill does not re-classify regulatory tiers as that would risk producing divergent classifications across two skills.
If /mycelium:regulatory-review has not run:
/mycelium:regulatory-review is the canonical AI Act tier classifier. Without it, this skill produces a provisional tier only — which is fine for early development but should not be the final source for L4/L5 transitions."high; user-affecting AI without Annex III → limited; non-user-affecting AI → minimal.xai.tier with a provisional: true note until /mycelium:regulatory-review confirms.If tier classification yields prohibited: stop immediately. Escalate. Do not run subsequent stages — the product cannot ship under EU AI Act Article 5.
Item caps by tier (pre-committed to prevent checklist sprawl):
minimal: ≤5 total items across all stageslimited: ≤15 total itemshigh: ≤25 total itemsRows = relevant stakeholders for this tier:
end_user (always)affected_non_user (high-risk only — e.g., a person whose data is used but who didn't initiate the interaction)deployer_developer (limited+)regulator (high-risk)Columns = Liao, Gruen, Miller (2020) question categories, subset by tier:
| Tier | Questions checked |
|---|---|
| minimal | output (what can it do?), why (basic rationale) |
| limited | + input (what data?), why_not (contrastive), how_to_be_that (recourse) |
| high | + what_if (sensitivity), performance (per-population accuracy), how_global (overall mechanism) |
For each cell relevant at the determined tier, ask the operational question: "Is this answerable for this stakeholder, in the moment of impact, by an interface that exists today?" Verdict per cell: pass / partial / fail / N-A.
This is intentionally Bansal et al. (2021) friendly — the test is "answerable when needed," not "always-on documentation."
Run only when the product surfaces LLM-generated rationales to users (e.g., "Recommended because…", "Denied because…", chain-of-thought summaries).
blind_prediction_accuracy = correct_predictions / N.pass ≥ 0.7; partial 0.5–0.69; fail < 0.5. Below 0.5 means the rationale doesn't actually justify the output — Lanham et al. (2023) faithfulness gap.Save raw samples to .claude/evals/xai-fidelity/<service>/YYYY-MM-DD.json (mkdir -p on first write — directory may not exist). Aggregate stats land in services.yml :: <service>.xai.fidelity.
If the product does not surface LLM-generated rationales to users, set xai.fidelity.verdict: not_applicable and skip — but check Stage 2 for the equivalent surface gaps.
Reference .claude/templates/ai-system-card.md (Mitchell et al. 2019 format). Required sections (per the template's Required markings):
(8 and 9 are recommended, not required.)
Check whether the product publishes a system card at docs/ai-system-card.md or a documented equivalent. For each required section, mark present / missing. Verdict: pass (all required present), partial (≥70% present), fail (<70% present or no card published).
End-to-end test:
This is the Selbst & Barocas (2018) substance check. Without recourse, the rest of XAI is theatre. Verdict: pass (all five sub-checks pass), partial (path exists but missing SLA or logging), fail (no path or path loops).
Write to services.yml per service:
services:
- id: svc-001
name: "<service name>"
xai:
tier: limited
last_assessed_at: "2026-05-04T12:00:00Z"
surfaces:
end_user:
output: pass
why: pass
why_not: partial
how_to_be_that: fail
input: pass
deployer_developer:
output: pass
# ...
recourse:
path_exists: true
max_clicks_to_human: 4
sla_documented: false
logs_contestation: true
verdict: partial
fidelity:
samples_audited: 10
blind_prediction_accuracy: 0.65
verdict: partial
last_audited_at: "2026-05-04T12:00:00Z"
system_card:
path: "docs/ai-system-card.md"
sections_present: [identity, intended_use, model_details, contact]
sections_missing: [performance_and_limitations, explainability, recourse, privacy]
verdict: fail
validated_functionally: [stage_1_tier, stage_4_system_card, stage_5_recourse]
needs_user_testing:
- "stage_2 surfaces — answerable-when-needed verified by static review only"
- "stage_3 fidelity — sample size 10 limits population-level claim"
Idempotency: re-runs overwrite the xai block in place; do not append. Preserves git diff readability across periodic re-audits.
After writing canvas, present a remediation list ranked by stakeholder impact:
XAI check — <service name> — tier: limited
Findings:
✓ Stage 1 (tier classification) — pass (provisional, /mycelium:regulatory-review not run)
△ Stage 2 (matrix) — 2 cells fail: end_user.how_to_be_that, end_user.why_not
△ Stage 3 (fidelity) — partial (0.65 — 5 of 10 sampled rationales did not predict the output)
✗ Stage 4 (system card) — fail (4 of 8 required sections missing)
△ Stage 5 (recourse) — partial (path exists but no SLA documented)
Remediation (ranked):
1. [end_user] Stage 4 — publish missing system card sections (explainability, recourse, performance, privacy)
2. [end_user] Stage 5 — document the SLA for contestation responses
3. [end_user] Stage 2 — surface contrastive (why_not) explanation in the UI
4. [all] Stage 3 — investigate why fidelity is below threshold; consider tightening prompt or removing rationale surface
Validated functionally: stages 1, 4, 5 (static review).
Needs user testing: stages 2 (answerable-when-needed in real flows), 3 (population-level fidelity).
Run /mycelium:regulatory-review to confirm the tier is canonical, then re-run /mycelium:xai-check after remediation.
/mycelium:regulatory-review is canonical for tier classification; this skill consumes its output. G-S7 (disclose AI) and G-S8 (assess AI Act) are intent guardrails that this skill operationalizes./mycelium:security-review covers OWASP. Phase 2.3 will add explanation-attack threats to threat-model.yml — until then, explanation-layer threats are flagged in this skill's output but not enumerated structurally./mycelium:definition-of-done (AI-aware DoD, Phase 2.2) consumes Gate 13 verdicts.needs_user_testing for follow-up.