Skill

Strong Inference

This skill should be used when the user wants to "investigate a bug", "debug an issue", "figure out why something is happening", "find the root cause", "troubleshoot", "強い推論で調査", "仮説を立てて検証", "原因を特定", "バグの原因調査", "なぜ動かないか調べて", "問題を切り分け", "原因不明", "デバッグ", or mentions systematic hypothesis-driven debugging. NOTE: Use this for investigating UNKNOWN causes, not for validating design proposals (use devils-advocate for that).

Install

npx claudepluginhub masup9/codex-collab --plugin codex-collab

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Apply the "Strong Inference" methodology to investigate problems systematically through competing hypotheses and decisive experiments.

Supporting Assets

references/hypothesis-template.mdreferences/verification-patterns.md

SKILL.md

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

139.2k

mcp-builder

9 files

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

anthropics-skills-13

124.2k

canvas-design

20 files

Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.

anthropics-skills-13

124.2k

Stats

Stars2

Forks0

Last CommitFeb 8, 2026

Actions

View Source View Plugin View on GitHub View README

Strong Inference Skill

Apply the "Strong Inference" methodology to investigate problems systematically through competing hypotheses and decisive experiments.

Overview

Strong Inference is a scientific method that accelerates problem-solving by:

Generating multiple competing hypotheses
Designing experiments that eliminate hypotheses
Iterating until the most likely explanation remains

This skill helps developers investigate bugs, performance issues, and unexpected behaviors using a structured, hypothesis-driven approach.

Key Feature: In codex mode, this skill can optionally leverage Codex for hypothesis generation and review while Claude handles verification execution.

Prerequisites

The user has a problem, bug, or unexpected behavior to investigate
Relevant code context is available
For Codex collaboration mode: Codex CLI available (codex exec)

Workflow Phases

Phase 1: Problem Definition

When the user presents a problem:

Collect information:
- Error messages, logs, stack traces
- Steps to reproduce
- Expected vs actual behavior
- Recent changes that might be related
Clarify scope:
- Which components are involved?
- When did it start happening?
- Is it reproducible consistently?

Phase 2: Hypothesis Generation

Generate 2-4 competing hypotheses that are:

Mutually exclusive: If H1 is true, H2 cannot be true
Testable: Can be verified or eliminated with evidence
Specific: Clear enough to design a decisive test

Example hypotheses for "API returns 500 intermittently":

H1: Database connection pool exhausted under load
H2: Race condition in cache update causing stale data
H3: External service timeout not handled properly
H4: Memory leak causing OOM conditions

Phase 3: Verification Design

For each hypothesis, design a "killer experiment" that:

Can eliminate the hypothesis if the result is negative
Requires minimal effort for maximum information gain
Is safe to execute (no production impact)

Prioritize experiments by:

Ease of execution (quick wins first)
Discriminating power (eliminates multiple hypotheses)
Risk level (non-destructive first)

Phase 4: Verification Execution

Execute verifications in priority order:

Code inspection (reading files, checking logic)
Log analysis (searching for patterns)
Test execution (running specific tests)
Instrumentation (adding debug output)

Safety guards:

Confirm before any file modifications
Set timeouts for long-running operations
Log all executed commands

Phase 5: Analysis and Iteration

After each verification:

Record evidence: What was observed?
Update hypothesis status:
- [X] Eliminated (evidence contradicts)
- [?] Pending (not yet tested)
- [!] Supported (evidence aligns)
Refine remaining hypotheses based on new information
Generate new hypotheses if all were eliminated

Phase 6: Conclusion

When one hypothesis has strong supporting evidence:

Summarize findings: Evidence trail and reasoning
Propose solution: Based on confirmed hypothesis
Suggest prevention: How to avoid similar issues

Role Distribution

Mode	Hypothesis Gen	Verification Design	Execution	Review
`codex`	Codex	Claude	Claude	Codex
`claude-only`	Claude	Claude	Claude	Claude

Default mode: codex (when Codex CLI available)
Fallback: claude-only (automatic when Codex unavailable)

Hypothesis Tree File

Investigation state is persisted to tmp/strong-inference/<task-id>.md:

---
schema: strong-inference/v1
task_id: abc123
created: 2026-02-02T12:00:00Z
problem: "API returns 500 intermittently"
mode: codex
---

# Investigation: API returns 500 intermittently

## Hypotheses

### H1: Database connection pool exhausted
- Status: [X] Eliminated
- Evidence: Connection count stable at 5/20 during error window
- Verified: 2026-02-02T12:15:00Z

### H2: Race condition in cache update
- Status: [?] Pending
- Test: Add mutex logging to CacheManager.update()
- Priority: High (matches timing pattern)

### H3: External service timeout
- Status: [!] Supported
- Evidence: Errors correlate with ExternalAPI latency spikes
- Next: Verify timeout handling in ApiClient.fetch()

## Verification Log

| Time | Action | Result |
|------|--------|--------|
| 12:05 | Read db/pool.go | Found pool size config |
| 12:10 | Check connection metrics | Stable at 5/20 |
| 12:15 | Eliminated H1 | Evidence contradicts |

Safety Guards

Before executing verification commands:

Confirm destructive operations: File changes, test execution
Set timeout: Default 60 seconds per operation
Log all commands: Record in verification log

Stop conditions:

All hypotheses eliminated (request new hypotheses)
max_iterations reached (default: 10)
User requests stop

Output Format

Progress Display

Strong Inference Investigation
==============================
Problem: API returns 500 error intermittently

Hypotheses:
  [X] H1: Database connection pool exhausted
      Evidence: Connection count normal (eliminated)

  [!] H2: Race condition in cache update
      Evidence: Timing matches error pattern (supported)

  [?] H3: External service timeout
      Evidence: Pending verification

Current: Designing test for H2

Completion Report

Investigation Complete
======================
Problem: API returns 500 error intermittently

Root Cause: Race condition in CacheManager.update()
Confidence: High (3 supporting evidence points)

Evidence Trail:
1. Errors occur only during cache refresh window
2. Adding mutex eliminated the error
3. Race condition visible in thread dump

Recommended Fix:
- Add mutex lock in CacheManager.update() line 45
- Consider using sync.RWMutex for better concurrency

Prevention:
- Add race detector to CI pipeline
- Review other cache operations for similar patterns

Invoking the Skill

Use the /strong-inference command:

# Basic usage - investigate a problem
/strong-inference API sometimes returns 500 errors

# With mode selection
/strong-inference --mode claude-only Why is the test flaky?

# Japanese
/strong-inference このバグの原因を調査して

References

Detailed templates in references/:

hypothesis-template.md - Template for Codex hypothesis generation
verification-patterns.md - Common verification strategies