Help us improve
Share bugs, ideas, or general feedback.
From business-operations-skills
Use when a Head of Ops, Knowledge Manager, or TPM-Internal needs to author, validate, or clean up company SOPs and internal runbooks (procurement intake, vendor offboarding, incident-comms cascade, employee onboarding, expense reimbursement, system-access provisioning, customer-escalation playbook) — including 5W2H completeness checks (Who-What-When-Where-Why-How-HowMuch), cross-link and orphan-page validation across a sprawling Notion/Confluence/Obsidian wiki, KB ingestion + hygiene reporting, ops onboarding doc generation, and runbook step verification (named owner, expected duration, observable success signal, rollback path, escalation contact). Pairs Kaoru Ishikawa's 5W2H method, Atul Gawande's *The Checklist Manifesto*, ISO 9001, ITIL v4 Service Operation, FDA 21 CFR Part 211, and Google SRE Workbook runbook discipline with deterministic stdlib-only Python tools that score completeness, detect anti-patterns, and emit prioritized cleanup lists. Distinct from `engineering/llm-wiki` (Karpathy-style personal PKM second brain), `engineering-team/runbook-generator` (system-ops production debugging runbook), `project-management/*` (Jira/Confluence delivery + ticket tracking), and sibling `business-operations/process-mapper` (BPMN process *design*, while knowledge-ops is process *documentation*).
npx claudepluginhub kruxshnx/claude-skills-devin --plugin business-operations-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/business-operations-skills:knowledge-opsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Company SOP + internal runbook authoring, 5W2H completeness validation, and KB hygiene reporting for Head-of-Ops / Knowledge-Manager / TPM-Internal personas.
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Company SOP + internal runbook authoring, 5W2H completeness validation, and KB hygiene reporting for Head-of-Ops / Knowledge-Manager / TPM-Internal personas.
An ops organization three years in accumulates a sprawl: 600 Notion pages, 200 Confluence runbooks, three Obsidian vaults, a Drive/SOPs/ folder, and a Slack #ops-questions channel that exists because nobody can find the canonical doc. Predictable failure modes:
This skill answers the operator's actual question: "Which 20 docs do I fix first, and what specifically is wrong with each?" — with deterministic logic, not intuition.
Drive/SOPs/ directory) and surfacing what's broken: orphan pages, stale pages (no edit > 12 months), glossary drift, missing-owner pages, cross-link map.Four-step deterministic flow (matches the ops org's actual workflow, not an abstract process):
kb_ingester.py --input <vault-dir> on the existing wiki export. Output is a markdown health report: orphan pages, stale pages, glossary drift, missing-owner pages, cross-link map, prioritized cleanup list. The report ranks the top-20 docs to fix first — usually a mix of high-traffic stale docs and compliance-relevant missing-owner docs. Take this list to the cleanup sprint.runbook_validator.py --input <runbook.md>. The validator scores each step against six checks (named owner, expected duration, observable success signal, observable failure signal, rollback path, escalation contact) and produces a per-step traffic-light + overall validity score 0-100 + MUST-FIX issue list. A runbook scoring < 60 is not safe to use in an incident.sop_generator.py --input <metadata.json> --profile <ops|support|finance|hr|it|regulated>. Output is a 5W2H-structured SOP scaffold: Who (RACI), What (process steps), When (triggers + frequency), Where (system + tool), Why (purpose + regulatory basis), How (step-by-step), How-much (cost + time per execution). The regulated profile adds version control, signoff, and audit-trail sections (ISO 9001 / FDA 21 CFR Part 211 / SOC 2 / HIPAA).kb_ingester.py after the cleanup sprint to verify orphan-page count is down and glossary drift is resolved. The metric that matters is "unfindable docs" (orphans) and "unsafe runbooks" (validity score < 60) — not page count.scripts/sop_generator.py — Reads a JSON metadata file describing an SOP (process owner, triggering event, audience role, frequency, regulatory overlay, inputs, outputs, steps outline) and emits a full 5W2H-structured SOP in markdown (or normalized JSON). The --profile flag tunes the output: ops (general internal ops), support (customer-support runbook style), finance (controls + reconciliation focus), hr (sensitive-data flagging), it (system + access focus), regulated (adds version control, signoff matrix, audit-trail). Regulatory overlays (SOC2, HIPAA, ISO13485, GDPR, SOX) attach the appropriate compliance preamble. --sample prints a complete vendor-offboarding SOP example. Stdlib only.
scripts/runbook_validator.py — Reads a runbook (markdown file or JSON) and validates each step against six required attributes: (1) named owner (not "the team", not "ops"), (2) expected duration (concrete number + unit), (3) observable success signal (e.g., "HTTP 200 from /healthz" — not "service is up"), (4) observable failure signal, (5) rollback path (or explicit "this step cannot be rolled back, escalate to X"), (6) escalation contact (named person or named on-call rotation). Output is a per-step traffic-light (GREEN/AMBER/RED), an overall validity score 0-100, and a MUST-FIX issue list. Verdict: ≥ 80 = SAFE-TO-USE, 60-79 = USE-WITH-CAUTION, < 60 = NOT-SAFE. --sample prints a deliberately-broken incident-comms runbook to demonstrate failure detection. Stdlib only.
scripts/kb_ingester.py — Walks a directory of markdown files (Notion export, Confluence space export, Obsidian vault, Drive/SOPs/ directory). Extracts: (a) cross-link map (which page references which, via markdown [link](path) syntax), (b) glossary candidates (frequently used proper nouns and acronyms that recur in 3+ docs without a single canonical definition page), (c) orphan pages (no inbound links from anywhere in the vault), (d) glossary drift (the same term defined or used inconsistently across docs — e.g., "CSM" expanded differently in two places), (e) stale pages (no edit in > 12 months, detected via filesystem mtime or YAML last_reviewed frontmatter), (f) missing-owner pages (no owner: field in frontmatter). Emits a KB health report markdown with a prioritized top-20 cleanup list ranked by staleness × inbound-link-count (high-traffic stale docs first). --sample builds a tiny synthetic 8-page vault in a tmpdir and runs the full pipeline against it. Stdlib only.
references/5w2h_sop_canon.md — Kaoru Ishikawa's 5W2H method, Toyota standard-work discipline, Atul Gawande's checklist manifesto, Atlassian Confluence SOP guidance, ISO 9001 SOP requirements, ITIL v4 Service Operation, FDA 21 CFR Part 211. Eight cited sources covering SOP authoring canon.references/runbook_canon.md — Google SRE Workbook (runbook chapter), Atlassian incident-management runbooks, PagerDuty Incident Response taxonomy, AWS Well-Architected operational excellence pillar, Charity Majors on observability-runbook integration, Susan Fowler on production-ready microservices, ITIL v4 Operations. Seven cited sources covering runbook design canon.references/kb_hygiene_anti_patterns.md — Eight anti-patterns drawn from Notion/Confluence wiki industry research, Mozilla SUMO knowledge-base lessons, Stack Overflow community-management research, the Atlassian Team Playbook, MIT TIK org-wiki studies, Cynthia Lee on glossary drift, and Adam Wiggins on "documentation rot".owner: alex@company.com) or in a top-of-page "Owner:" line. Tribal-knowledge ownership (the person who last edited the page) is treated as missing.--stale-days on kb_ingester.py. Some compliance regimes (FDA, ISO 13485) require shorter review cycles; use --profile regulated and --stale-days 365.engineering/llm-wiki.runbook_validator.py as a checkbox. The validator catches missing structure. It does not catch wrong content. A runbook can score 100 and still tell the operator the wrong thing.process-mapper. Process-mapper documents the flow of work between stages (BPMN, cycle time, bottleneck). Knowledge-ops documents the artifacts operators consume to execute the work (SOP, runbook, glossary). Both can apply to the same process.kb_ingester.py output.--profile regulated. Missing version control on a regulated SOP is an audit finding.engineering/llm-wiki — Karpathy-style personal PKM second brain where one human ingests sources into their own interlinked vault. Knowledge-ops is organizational: many authors, many readers, named owners per doc, formal review cycles, compliance overlays.engineering-team/runbook-generator — system-ops runbook for debugging a production system (logs, alerts, k8s, on-call). Knowledge-ops runbooks are operator runbooks for business processes (incident-comms cascade, vendor offboarding, employee onboarding). The audience is fellow operators, not engineers tailing logs.project-management/* — Jira / Confluence delivery tracking, sprint ticket workflow, project-status reporting. Knowledge-ops is the content in those Confluence pages, not the tracking of who edits them.business-operations/process-mapper (sibling) — BPMN process design: where the stages are, where work waits, which stage is the bottleneck. Knowledge-ops is process documentation: the SOP and runbook artifacts that tell an operator how to execute the process the mapper described.business-operations/internal-comms (sibling) — broadcast announcements, all-hands messaging, change-management comms. Knowledge-ops is the durable reference artifact; internal-comms is the broadcast.ra-qm-team/* — formal regulatory compliance authoring (ISO 13485 QMS, MDR technical files, 21 CFR Part 820). Knowledge-ops borrows the regulatory checklist but is not a substitute for a notified-body audit.Before invoking the tools, the orchestrator (or /cs:grill-bizops) walks the user through these questions one at a time, with a recommended answer + canon citation. Never bundled. Walk depth-first — do not open question 4 until 1-3 are locked.
"Who is the named owner of this SOP / runbook, and do they know they own it?" Recommended: a single human (not "the team"), and yes — they have agreed in writing. Canon: Gawande 2009 (The Checklist Manifesto) — checklists without an owner rot within 12 months. Ownership is the discipline.
"When was this doc last reviewed, and what is the review cadence?"
Recommended: reviewed within the last 12 months (90 days if --profile regulated); cadence written in the frontmatter.
Canon: ISO 9001:2015 §7.5.3 — controlled documents require review-cycle metadata. ITIL v4 echoes this for Service Operation runbooks.
"For each runbook step: what is the observable success signal — by which I mean, what specific output tells you the step worked?"
Recommended: a concrete observable ("HTTP 200 from /healthz", "Slack thread closed with done reaction", "Salesforce opportunity moved to Closed-Won stage") — not "the service is up" or "it works".
Canon: Beyer et al. 2018 (Site Reliability Workbook, Ch. 8) — observable signals are the entire point of a runbook. Vague success criteria are the leading cause of runbook misuse during incidents.
"What is the rollback path for each runbook step that can fail?" Recommended: every step that mutates state has either a rollback path or an explicit "cannot roll back — escalate to X" line. Canon: AWS Well-Architected Framework, Operational Excellence pillar — "you cannot run a process you cannot reverse without first agreeing what 'reverse' means".
"Where does this doc live, and what other docs link to it?" Recommended: in the canonical wiki, and at least 2 inbound links from related docs. An orphan SOP is an unfindable SOP. Canon: Atlassian Team Playbook on documentation health — orphan rate > 20% is the leading indicator of a wiki sprawl problem.
"What is the regulatory overlay on this process — SOC 2, HIPAA, ISO 13485, GDPR, SOX, none?" Recommended: explicit answer. If "none", confirm by checking the data classes the process touches. Canon: FDA 21 CFR Part 211.100 (Written procedures; deviations) — regulated SOPs require version control, change history, and signoff. Skip this step and the doc is an audit finding.
"Is the happy path the only path documented, or are the 2-3 most common failure modes also documented?" Recommended: the top-2 failure modes per process are documented with their own recovery sub-procedure. Canon: Fowler 2016 (Production-Ready Microservices) — operations docs that cover only the happy path are responsible for 60%+ of incident-time waste.
After all 7 are locked, invoke kb_ingester.py → runbook_validator.py → sop_generator.py in sequence.