Drafts a model card for a financial-services AI or model use case, with named sections for intended use, training and reference data, performance, limitations and known failure modes, monitoring plan, controls, change management, and sign-off questions. The card is the firm-side governance artifact that supports model risk committee review, pre-prod gates, the model inventory of record, validator handoff, and regulator response files. Best for: - A first-line owner has proposed an AI use case and second-line needs the model card before a tier decision or pre-prod gate. - A model risk team is refreshing model cards as part of an annual model inventory exercise. - A new vendor model is replacing an existing one and the card needs to be updated to reflect the swap. - A regulator response file or examiner request requires the firm's documented view of an in-scope model. Not the right tool when: - The use case has not been intaked yet (use ai-use-case-intake first). - Validation testing has not run and there are no performance results to summarise (use validation-plan; the card consumes its outputs). - The artifact required is the EU AI Act Annex IV technical documentation. That is the provider-side file; the firm-side card sits alongside it. Use ai-act-triage to scope Annex IV deltas.
How this skill is triggered — by the user, by Claude, or both
Slash command
/ai-governance-model-risk:model-card-builder [use-case ID, intake record, validation report, monitoring evidence, vendor system card, or scope statement][use-case ID, intake record, validation report, monitoring evidence, vendor system card, or scope statement]The summary Claude sees in its skill listing — used to decide when to auto-load this skill
A model card is the firm-side governance artifact for an AI or model use case: intended use, system, data, performance, robustness, fairness, limitations, monitoring, controls, change management, sign-off questions, source trace. It is what the model risk committee reaches for, what the validator works against, what the pre-prod gate reviewer challenges, what the model inventory of record consu...
TROUBLESHOOTING.mdexamples/credit-decisioning-ml.mdexamples/genai-customer-support.mdreferences/cross-cutting/conduct.mdreferences/cross-cutting/cyber.mdreferences/cross-cutting/privacy.mdreferences/sector-overlays/banking.mdreferences/sector-overlays/capital-markets.mdreferences/sector-overlays/insurance.mdreferences/sector-overlays/payments-fintech.mdreferences/source-anchors.mdschemas/model-card.schema.jsontemplates/default-output.mdA model card is the firm-side governance artifact for an AI or model use case: intended use, system, data, performance, robustness, fairness, limitations, monitoring, controls, change management, sign-off questions, source trace. It is what the model risk committee reaches for, what the validator works against, what the pre-prod gate reviewer challenges, what the model inventory of record consumes, what an examiner is handed when they ask "what does this model do, where does it work, where does it fail, who is accountable, and what is monitored."
The card serves both lenses. A 1.5-line model owner uses the skill to consolidate the work as it stands today; a 2-line reviewer uses the same skill to challenge what was drafted. The seam between the two is the source-trace block and the sign-off questions.
The card is a draft until a human reviewer attests. The skill stops short of filing or approving it.
Most of what the card needs is already on the table by the time someone reaches for this skill. A few things to settle before drafting:
ai-risk-tiering has not run, route there before building the card.When the scope record is supplied, the skill consumes it for institution, persona, source posture, sector and cross-cutting overlays, lifecycle stage, and architecture flags. Otherwise it asks the practitioner the few facts it needs, and source posture sets what the card can assert at high confidence and what carries [evidence needed].
The card has the same spine across model types. The order below is roughly how a senior practitioner walks it; in practice the conversation surfaces sections in whatever order the evidence arrives, and the structured object sorts itself.
Tier drives depth, so consume the tier output before deciding how much weight any section carries. That order is load-bearing. Likewise, do not draft validation-related sections (performance, robustness, fairness) before the upstream ai-use-case-intake and ai-risk-tiering artifacts are in hand, and validation-plan outputs where they exist. Routing to those skills first is faster than re-doing the work.
Metadata names the model and version, owner, validator, second-line reviewer, date, tier, lifecycle stage, classification, and the upstream use-case and scope IDs. Owners are roles or functions, never named individuals.
Intended use sets purpose, intended users, the in-scope decisions, and the out-of-scope decisions. The out-of-scope list is what reviewers probe hardest; it is the firewall against scope drift after the model is in production.
System description covers model type and architecture, dependencies, autonomy level. For GenAI use cases, this is also where the foundation-model dependence, RAG corpus, and tool inventory land (see GenAI overlay below). Autonomy level is the input that drives the controls section.
Training and reference data lists each data source with purpose (training, fine-tuning, retrieval, evaluation, monitoring, reference), sensitivity (NPI, PHI, public, confidential), licensing, retention, lineage, and exclusions (prohibited rating factors, fairness-sensitive proxies, fields outside lawful basis). For foundation-model training data, firm visibility is typically limited to vendor system-card statements; record at appropriate confidence.
Performance reports segment-level results on protected and policy-relevant dimensions, even when overall performance is acceptable. Overall metrics alone hide segment exposure. Deployed-environment metrics dominate for production cards; lab-only metrics carry separately and are labelled. For GenAI: faithfulness, citation precision, refusal rate, latency, plus task-specific metrics.
Robustness and stress testing covers out-of-time backtests, stress scenarios, and segment robustness. For GenAI: prompt injection, jailbreak, and RAG poisoning tests, plus the vendor-supplied test set. For execution algos, the Rule 15c3-5 control overlap goes here (see capital-markets overlay).
Fairness and explainability names the protected and policy-relevant dimensions, the metric, the disparity assessment, and the method's limitations. For insurance pricing or underwriting, frame against unfair discrimination using NAIC #880 vocabulary in addition to or in place of generic fairness terminology; load the insurance overlay.
Limitations and known failure modes are concrete, each with an example. "Model may not generalise" tells a reviewer nothing. If you cannot give an example, surface the limitation as a sign-off question instead.
Monitoring entries each name a metric, threshold, frequency, owner, and escalation path. All five fields. The owner and the escalation path are what reviewers flag first; the chase across functions (consumer compliance owns complaint rate, infosec owns prompt-injection signal, vendor management owns version notice) is part of the work.
Controls are preventive, detective, response, or compensating. Each has a named owner and a pointer to evidence. A control without an evidence pointer is policy, not a control.
Change management lists the re-validation triggers: feature change, training-window shift, performance breach for N consecutive periods. For GenAI: foundation-model swap, RAG corpus scope change, tool addition, autonomy-level change. The card is a living document; foundation-model version monitoring is not optional once a vendor LLM is in the loop.
Sign-off questions are tagged to sections, answerable from the card itself, and specific to this card. "Is this fit for purpose" applies to every card and challenges nothing. "The agent review of assistant output is the only preventive control between the model and the customer; what is the off-switch criterion if agent-edit rate falls below threshold for two consecutive months" applies to one card and one model.
Source trace and confidence records every material claim, its source, the evidence pointer, and a confidence label. Vendor system cards and vendor evaluations carry vendor-self-attestation confidence (typically low to medium); firm-independent evaluation carries higher confidence. Do not collapse vendor and firm evidence into one line. Items without evidence carry [evidence needed] and route to the engagement issue log.
Depth flexes with tier and audience. A short tier-4 card may compress sections to two or three lines apiece; a deep tier-1 card expands every section. Empty named sections are not acceptable, but compression is.
When architecture.foundation_model is set, or architecture.uses_rag is true, or architecture.uses_tools is true, the GenAI overlay block fires. It lands inside the named sections rather than as a separate document:
The overlay is mandatory once triggered. Missing the GenAI sections on a GenAI card is what a second-line reviewer flags first when the card lands for challenge.
When the scope names a sector (banking, insurance, capital markets, payments-fintech), load the matching references/sector-overlays/<sector>.md. The overlay's named fields and sign-off questions land in the card; treating the overlay as background reading is the failure mode. Same pattern for the cross-cutting overlays this skill carries: cyber, privacy, conduct. Climate is not applicable to model cards.
Load only the overlays the scope names. Gold-plating a card with overlays the engagement does not implicate adds noise without challenge value.
The card is only credible when these hold:
[evidence needed] and go to the engagement issue log, not silently into the card.[verify section] in the source-anchors file (not in the card body).Tier drives depth. Lifecycle stage drives which sections lean heavy. Audience drives tone (working group is plain, committee is structured, examiner response is formal, board is distilled). Persona sets the review path and the named decision owners. Sector and cross-cutting overlays load from the scope. Source posture sets what the card can assert at high confidence and what carries [evidence needed]. Where firm-specific policy or taxonomy applies, it lives in references/firm-overlay.md (consumed when present) and never in the card directly.
Default to drafting the card against templates/default-output.md. Render as Word, Markdown, or another format as the audience asks for it; a model risk committee usually wants a Word memo, an inventory-of-record extract is a structured object, an examiner response is often Word with the source-trace block in a tidy table. Produce the structured record at schemas/model-card.schema.json when downstream automation, the model inventory of record, or a registered consumer needs it. The reviewer attestation block is filled by the human reviewer; the card is filed only after.
Downstream consumers: board-ai-risk-pack pulls metadata, tier, intended use, performance and limitations summary, and material sign-off questions. validation-plan pulls architecture and data sections to scope validation work. The ai-governance-reviewer agent pulls the structured object for second-line challenge. The model inventory of record pulls the structured object for the central registry. The card itself is what an examiner is handed. The schema is the input contract for those consumers; additive changes only, never silent renames. Breaking changes ship as a versioned migration with the consumers given notice.
references/source-anchors.md — citations and excerpts for the named anchors.references/sector-overlays/{banking,insurance,capital-markets,payments-fintech}.md — sector overlays loaded from scope.references/cross-cutting/{cyber,privacy,conduct}.md — cross-cutting overlays loaded from scope.references/firm-overlay.md — firm policy, taxonomy, named owners (consumed when present).templates/default-output.md — card template.schemas/model-card.schema.json — structured-output contract.examples/ — anonymised public-source-derived scenarios.TROUBLESHOOTING.md — recurring defects.npx claudepluginhub anotb/second-line-financial-services --plugin ai-governance-model-riskCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.