Help us improve
Share bugs, ideas, or general feedback.
From thinking-skills
Expresses forecasts, estimates, and risks as probability ranges with base-rate anchoring and explicit updates when new evidence arrives.
npx claudepluginhub tjboudreaux/cc-thinking-skills --plugin thinking-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/thinking-skills:thinking-probabilisticThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Probabilistic thinking, informed by Philip Tetlock's "Superforecasting," treats a forecast as a probability and a range rather than a single confident number. Three moves do almost all the work: **anchor on the base rate**, **express the estimate as a range** (not a point), and **update the number** when new evidence arrives.
Routes probabilistic thinking to the right skill: base-rate anchoring, confidence calibration, expected value, or scenario weighting. Activates on queries about probability, likelihood, and uncertainty.
Applies Bayesian reasoning to update probability estimates with new evidence, helping make better forecasts, avoid overconfidence, and calibrate judgments under uncertainty.
Applies Bayes' Theorem to update beliefs given a specific prior and new evidence. Use when interpreting test results, metrics, or diagnostic signals to avoid overreacting.
Share bugs, ideas, or general feedback.
Probabilistic thinking, informed by Philip Tetlock's "Superforecasting," treats a forecast as a probability and a range rather than a single confident number. Three moves do almost all the work: anchor on the base rate, express the estimate as a range (not a point), and update the number when new evidence arrives.
Core Principle: Start from how often similar things happen, state your estimate as a range with a confidence level, and move the number — explicitly — when the evidence moves.
Stateless-agent note. Across a single task you have no persistent prediction log, so there is no "track my calibration over months" step here. The leverage is in the act of estimating: base rate, range, update. Apply the calibration attitude (assume you're overconfident; widen the range) without pretending to keep a cross-session scorecard you don't have.
Decision flow:
About to state a forecast/estimate/risk?
→ Outcome genuinely uncertain? → yes → BASE RATE, then a RANGE (not a point)
→ New evidence since last estimate? → yes → UPDATE THE NUMBER
→ Can you just look it up / measure it? → yes → DO THAT INSTEAD
thinking-bayesian; use it for the explicit prior × likelihood-ratio update.Convert vague language to numbers:
| Vague Statement | Probability Range |
|---|---|
| "Certain" | 99%+ |
| "Almost certain" | 90-99% |
| "Very likely" | 80-90% |
| "Likely" / "Probable" | 65-80% |
| "Better than even" | 55-65% |
| "Toss-up" | 45-55% |
| "Unlikely" | 20-35% |
| "Very unlikely" | 10-20% |
| "Almost impossible" | 1-10% |
| "Impossible" | <1% |
Express estimates as ranges, not points:
BAD: "The project will take 6 weeks"
GOOD: "I'm 80% confident the project will take 4-8 weeks"
BETTER: "50% confidence: 5-7 weeks; 90% confidence: 3-10 weeks"
Start with how often similar things happen:
Question: Will this feature launch on time?
Base rate: What % of similar features launched on time? ~40%
Adjustment: This team is experienced (+10%), scope is clear (+10%)
Estimate: ~60% probability of on-time launch
State your belief as a number:
## Prediction: Will we hit Q2 revenue target?
Initial estimate: 65%
Reasoning:
- Last 4 quarters: Hit 3/4 targets (75% base rate)
- Current pipeline: Slightly below historical (-10%)
- New product launching: Uncertain impact
What could change the probability?
Key uncertainties:
1. Will Enterprise deal close? (+15% if yes)
2. Will new product cannibalize existing? (-10% if significant)
3. Will competitor launch disrupt? (-20% if aggressive)
For complex predictions, branch scenarios:
Project success: ?
├── Technical risk resolves well (60%)
│ ├── Team stays intact (80%) → 0.60 × 0.80 = 48% → SUCCESS
│ └── Key person leaves (20%) → 0.60 × 0.20 × 0.50 = 6% → PARTIAL
├── Technical risk causes delays (30%)
│ ├── Scope reduced (60%) → 0.30 × 0.60 × 0.70 = 12.6% → SUCCESS
│ └── Scope maintained (40%) → 0.30 × 0.40 = 12% → FAILURE
└── Technical risk blocks project (10%) → 10% → FAILURE
P(Success) = 48% + 12.6% = 60.6% ≈ 60%
When new evidence arrives, update:
Original estimate: 65% hit revenue target
New information: Enterprise deal delayed to Q3
Impact: -15% (was +15% if closed, now neutral)
Updated estimate: 50%
New information: Competitor launch was weak
Impact: +10% (was -20% if aggressive)
Updated estimate: 60%
Make the forecast falsifiable within the task itself: a clear claim, a timeframe, and the range. This lets the user or a later observation verify it — you don't carry a personal scorecard across sessions, but a sharply-stated prediction can still be proven right or wrong.
Prediction: "80% confident the migration completes with <5 min downtime,
range 1-15 min downtime." (Checkable against the actual run.)
These are sanity checks you apply now, within the task — not a longitudinal tracking exercise.
"Would I bet at these odds?"
Prediction: 80% confident project finishes on time
Equivalent: Would I bet $4 to win $1?
If that feels wrong, adjust the probability.
Always check base rates:
Inside view: "Our team is great, we'll definitely finish on time"
Outside view: "What % of similar projects finished on time?"
Inside tends toward overconfidence
Outside provides calibration anchor
Imagine failure, then adjust:
Initial estimate: 85% success
After pre-mortem: Identified 5 failure modes I hadn't considered
Adjusted estimate: 70%
Are your intervals too narrow?
Test: Of your 90% confidence intervals, do 90% contain the actual?
Common finding: Only 60-70% do
Fix: Widen intervals by 50%
## Project: Payment System Rewrite
Timeline estimate:
- 50% confidence: 8-12 weeks
- 80% confidence: 6-16 weeks
- 95% confidence: 4-24 weeks
Key variables:
- API complexity: High uncertainty (+/- 3 weeks)
- Team availability: Medium uncertainty (+/- 2 weeks)
- Integration testing: High uncertainty (+/- 4 weeks)
Commitment: "We're 80% confident we'll deliver in Q2"
## Risk: Database migration causes extended downtime
Probability assessment:
- Base rate for similar migrations: 20% have issues
- Our preparation level: Above average (-5%)
- Complexity of our schema: Above average (+5%)
- Rollback plan quality: Strong (-5%)
Estimate: 15% probability of extended downtime
Mitigation value:
- If issue occurs: 4 hours downtime × $10K/hour = $40K
- Expected loss: 15% × $40K = $6K
- Mitigation cost: $3K for additional testing
- Decision: Mitigation worth it (ROI positive)
## Decision: Adopt new framework
Success probability factors:
| Factor | Probability | Weight |
|--------|-------------|--------|
| Team learns quickly | 70% | 0.3 |
| Framework matures | 80% | 0.2 |
| Performance meets needs | 60% | 0.3 |
| Integration works | 75% | 0.2 |
Combined probability (simplified):
0.70 × 0.80 × 0.60 × 0.75 = 25% (if all must succeed)
OR weighted average: 70% (if partial success acceptable)
Decision: High uncertainty suggests pilot first
# Probabilistic Assessment: [Prediction]
## Prediction
[Clear, falsifiable statement with timeframe]
## Initial Probability
Estimate: [X]%
Base rate: [Similar events: Y%]
Adjustment rationale: [Why different from base rate]
## Confidence Interval
- 50% CI: [Range]
- 80% CI: [Range]
- 95% CI: [Range]
## Key Uncertainties
| Uncertainty | If positive | If negative |
|-------------|-------------|-------------|
| [Factor 1] | +X% | -Y% |
| [Factor 2] | +X% | -Y% |
## Updates (within this task)
| New information | Old P | New P |
|-----------------|-------|-------|
| | | |
## Checkable Outcome
[The specific observation that will prove this forecast right or wrong]
"The fox knows many things, but the hedgehog knows one big thing."
Superforecasters are foxes—they integrate many perspectives, update frequently, and avoid ideological certainty. They're not smarter; they're more calibrated.
"Beliefs are hypotheses to be tested, not treasures to be protected."
Your predictions should change as evidence changes. Holding steady when you should update is a calibration failure.