Skill

world-model-diagnostic

Runs 20-minute diagnostic mapping teams to world-model paradigms (vector DB, ontology, signal-fidelity) for AI readiness, judgment boundaries, and build prioritization.

ai-ml

npx claudepluginhub rjmurillo/ai-agents --plugin project-toolkit

Configuration

Model: claude-sonnet-4-6

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Source: Jonathan Edwards (OB1 community), adapted for ai-agents.

SKILL.md

Similar Skills

world-model-diagnostic

Runs 20-minute diagnostic mapping teams to world-model paradigms (vector DB, ontology, signal-fidelity) for AI readiness, judgment boundaries, and build prioritization.

copilot-cli-toolkit

fluency-assessment

Assesses team AI fluency via context questions, optional data import or 9-question quiz, generating scorecard across psychological barriers, integration failures, and ownership gaps for AI adoption leaders.

ai-adoption-playbook

ned-ai-helper

Provides AI governance frameworks, challenge questions, risk matrices, and literacy for Non-Executive Directors evaluating AI proposals and strategies.

20 files

career-helper

Stats

Parent Repo Stars26

Parent Repo Forks4

Last CommitApr 28, 2026

Used By2 plugins

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

World Model Diagnostic

Source: Jonathan Edwards (OB1 community), adapted for ai-agents.

Purpose

Your job is not to hand back a polished readiness score. Your job is to expose where information routing ends and editorial judgment begins, then recommend the smallest credible starting sequence.

This diagnostic answers five questions:

Where does reality leave the clearest fingerprint in this business?
Which world-model paradigm fits the company right now?
Does the company have an explicit boundary layer?
Where is it most exposed to simulated judgment?
What should it build first, second, and third?

Triggers

Trigger phrase	Operation
`run the world model diagnostic`	Start the 20-minute structured audit
`audit our world model`	Same as above, conversational form
`which world model architecture fits us`	Map company to paradigm
`audit where we automate judgment`	Boundary-layer audit only
`what should we build first for a world model`	Skip to recommended build sequence

When to Use

Use this skill when:

A team is choosing knowledge infrastructure (vector DB, ontology, telemetry pipeline) and needs to validate paradigm fit before investing.
An organization is adopting agent-driven workflows and needs to know where automated judgment is safe.
Leadership has thinned a management layer and wants to know where editorial judgment now lives.

Use a different skill when:

You need a per-PR code or design review. Use analyst or architect.
You need product strategy or roadmap prioritization. Use roadmap.
The org has already picked a paradigm and needs implementation help. Skip to the relevant build skill.

Non-Negotiable Rules

Do not give a numeric readiness score.
Label every conclusion as one of:
- Firm finding: directly supported by the user's answer or confirmed prior record.
- Inference: synthesis from available evidence.
- Open question: unresolved or missing evidence that materially affects the recommendation.
Keep the boundary layer central. Database choice is downstream of boundary clarity.
Start concrete, not abstract. Ask about recent information flows, recent decisions, recent misses.
Force ranking when discussing signal. Ask the user to rank the top 3 to 5 sources by fidelity.
Audit actual flows, not aspirational diagrams.
Do not let the model pretend judgment has been automated when evidence shows interpretation still lives in people.
Final recommendation must include: paradigm fit, boundary-layer status, top three simulated-judgment exposures, and first/second/third build steps.
Facts and interpretations cannot be presented with the same voice.
Stay lightweight. Batch questions so the session finishes in about 20 minutes.

Paradigm Mapping Contract

Map the company using these rules:

Company Type	Paradigm	Reason
Under 100 people, strong senior team	`vector database`	Senior people can temporarily act as a human boundary layer.
Enterprise, regulated, or operationally complex	`structured ontology`	Boundary must be architectural because errors are expensive.
Platform business with high-fidelity signal (transactions, telemetry, operational exhaust)	`signal-fidelity`	Business already emits machine-readable truth with a higher ceiling.
Knowledge-work company (conversations, docs, soft context)	`vector database`	Hardest case. Pair with aggressive boundary-layer work first plus explicit outcome encoding.

When cues conflict, use this priority order:

Highest-fidelity signal.
Cost of a bad interpretive decision.
Amount of senior human judgment available to absorb errors.

Five-Principle Evaluation

Evaluate without numeric scoring:

Principle	Question	Classifications
signal fidelity	Where does reality leave the clearest fingerprint?	`clear` / `mixed` / `low`
earned structure	Letting structure emerge from work, or forcing schema too early?	`earned` / `partially earned` / `imposed`
outcome encoding	Close the loop between action and result in a machine-readable way?	`present` / `partial` / `missing`
organizational resistance	Capture signal as a byproduct of work or require extra documentation?	`byproduct` / `mixed` / `manual`
time in system	How long has relevant data been flowing through anything durable?	`running` / `starting` / `not started`

Process

Phase 1: Orientation

Memory check. Search the project's memory layer (per the memory architecture in AGENTS.md) for prior diagnostic context. Use Serena memories or Forgetful. Treat every result as a hint, not confirmed fact. If memory tooling is unavailable, skip this step and note the gap in the final assessment. Suggested queries:
- world model
- boundary layer
- Strategic context hints from prior sessions.
- The company name, once known (re-run after Phase 2 intake if the name surfaces there).
Tell the user what the diagnostic will do:
- Intake on signal, data, and decision flow.
- Paradigm classification.
- Boundary audit on the highest-value information flows.
- Final assessment with fact-versus-inference labels.
State the persistence plan. The diagnostic produces three artifacts (intake summary, boundary audit summary, final assessment). Persist them via the repo's memory tooling unless the user declines. See "Persistence" below.

Phase 2: Intake

Keep to two or three batches of questions, not long isolated lists.

Required coverage:

Company size, industry, business model.
Regulated, safety-critical, or high-cost-of-error environment?
Top three to five data sources ranked by fidelity.
Where decisions currently get made.
Where editorial judgment currently lives.
Which management or synthesis layers have been removed or thinned.
How outcomes are recorded today.
Data capture: byproduct of work or separate burden?
How long any durable system has been running.

Strong prompt patterns:

"What are the three places reality leaves the cleanest fingerprint in this business?"
"Which decisions still depend on someone saying, 'ignore that, that's normal'?"
"Where did you remove a human layer and keep the information flow, but lose the interpretation?"

Phase 3: Classification

After intake, state:

Company case: description of the company type or situation.
Recommended paradigm: vector database, structured ontology, or signal-fidelity.
Main reason the fit is right.
Primary caveat or failure mode to watch.

Treat as provisional until the boundary audit is done.

Phase 4: Boundary Audit

Audit five to ten flows. If time is tight, top five only.

For each flow capture:

Field	Description
Flow name	e.g., "Customer support ticket prioritization."
Source	Where data originates.
Consumer	Who or what acts on it.
Current human editor or reviewer	Who interprets today.
Classification	`act on this` versus `interpret this first`.
Reason for label	Why that classification applies.
What goes wrong if the editor disappears	Risk assessment.
Exposure level	`high` / `medium` / `low`.

Prioritize flows that can move money, customers, roadmap, risk, or staffing.

If a flow looks factual at the source but interpretive at the output, call that out explicitly. Clean inputs do not guarantee trustworthy judgment.

Phase 5: Final Assessment

Return in this order:

Company case.
Paradigm fit.
Five-principle readout.
Boundary-layer status.
Top simulated-judgment exposures.
Starting sequence.
Confidence markers (firm findings / inferences / open questions).
Shift since last run, if a prior diagnostic exists in memory.

Output contract (consumable by downstream skills such as analyst or architect):

## Firm Findings
- {fact directly supported by evidence}
- {fact directly supported by evidence}

## Inferences
- {synthesis from available evidence}
- {synthesis from available evidence}

## Open Questions
- {unresolved issue affecting recommendation}
- {unresolved issue affecting recommendation}

## Paradigm Fit
- Paradigm: {vector database | structured ontology | signal-fidelity}
- Boundary status: {explicit | implicit | missing}

## Recommended Build Order
1. **First**: {usually boundary layer and flow labeling}
2. **Second**: {usually highest-fidelity capture and outcome encoding}
3. **Third**: {usually paradigm-specific retrieval or structure layer}

Only move the order around when evidence is strong.

A JSON variant of the same shape is acceptable when a downstream tool consumes the output programmatically. Keep the field names identical (firm_findings, inferences, open_questions, paradigm, boundary_status, build_order).

Self-check before returning: Verify the output includes all eight items from the list above. If any item is missing, add it before responding. If an item cannot be filled due to missing evidence, include it with an Open question label.

Persistence

Save exactly three artifacts via the repo's memory tooling, unless the user declines. Use Serena write_memory or the equivalent Forgetful entry point. Key the entries by company slug so future runs can detect drift.

1. Intake Summary

Entry name: diagnostic-{company-slug}-intake
Body: company name, size, industry; top three to five signal sources ranked by fidelity; where judgment lives; the five-principle classifications; date.

2. Boundary Audit Summary

Entry name: diagnostic-{company-slug}-boundary
Body: number of flows audited; flows classified as act on this; flows classified as interpret this first; flows missing a human editor; high-exposure flows; date.

3. Final Assessment

Entry name: diagnostic-{company-slug}-assessment
Body: paradigm; boundary status; top three exposures; build sequence (first, second, third); open questions; date.

If the user prefers files on disk for working notes, use a repo-relative path under .agents/analysis/diagnostics/{company-slug}/ with date-prefixed filenames (YYYY-MM-DD-intake.md, YYYY-MM-DD-boundary-audit.md, YYYY-MM-DD-assessment.md, YYYY-MM-DD-full-diagnostic.md). Do not write outside the repo.

Interview Style

Ask one batch at a time.
Stay direct and strategic.
If the user gives vague answers, push for one concrete example.
If the company obviously fits the knowledge-work case, say so directly.
If the company wants to automate judgment rather than routing, name that as the risk.
If the boundary layer is missing, say it plainly. Most companies land there.

Paradigm Definitions

Vector Database

Best for: knowledge work, under 100 people, strong senior team.
Trade-off: relies on a human boundary layer to interpret retrieval.
Risk: loses effectiveness as senior judgment thins out.
Example: a startup using Pinecone plus Claude with five senior engineers reviewing outputs.

Structured Ontology

Best for: enterprise, regulated, operationally complex.
Trade-off: high upfront structure cost, but errors are expensive.
Risk: over-engineered schema before patterns emerge.
Example: a healthcare system with an explicit medical ontology and compliance rules.

Signal-Fidelity

Best for: platform business with transactional data, telemetry, operational exhaust.
Trade-off: requires genuinely high-fidelity signal (most companies do not have this).
Risk: mistaking soft signals (emails, notes) for hard signals (transactions, metrics).
Example: an e-commerce platform using transaction logs and user behavior telemetry.

Common Pitfalls

Mistaking "we have data" for "we have signal"

Pattern: the company has lots of emails, docs, meeting notes.
Reality: these are interpretive artifacts, not clean signal.
Fix: classify as a knowledge-work case, then pair vector DB with aggressive boundary work.

Automating judgment that should stay human

Pattern: "AI will decide which customer requests are urgent."
Reality: urgency often requires context only humans have.
Fix: the boundary audit shows interpret this first, so keep a human in the loop and use AI for routing only.

Building structure before patterns emerge

Pattern: "We need a full ontology before we start."
Reality: most structure should be earned from observed work.
Fix: start with a vector DB plus outcome encoding and let the schema emerge.

Missing the boundary layer entirely

Pattern: "Our data is clean, we can just feed it to the model."
Reality: most "clean" data still requires interpretation.
Fix: run an explicit boundary audit and label every flow as act-on-this versus interpret-first.

Integration

Companion skills (when present in this repo):

work-operating-model (issue #1806). Use after the diagnostic to map the internal operating model.
panning-for-gold (issue #1802). If the diagnostic surfaces unstructured brain dumps, extract threads before retrieval design.
codebase-documenter (issue #1803). For an engineering-org variant, follow the diagnostic with a documentation pass.

If a companion is not yet ported, return the diagnostic output and let the operator route follow-on work manually.

Verification

Output includes all eight Phase 5 items (company case through shift-since-last-run).
Every conclusion labeled as Firm finding, Inference, or Open question.
Paradigm fit matches the Paradigm Mapping Contract rules above.
Boundary audit covers at least the top three highest-exposure flows.
No numeric readiness score appears anywhere in the output.