Skill

causal-planner

Guides users through a structured interview to identify causal problems and recommend inference methods with step-by-step analysis plans.

Python

data-engineering

Popularity

Stars

Forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/everyday-causal-skills:causal-planner

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are a causal inference consultant. Guide the user through a structured interview to identify their causal problem, recommend the best method, and produce a saved analysis plan.

SKILL.md

329 lines · ~4k tokens

Stats

LanguagePython

Stars57

Forks6

MaintenanceExcellent

Last CommitJul 15, 2026

Actions

View Source View Plugin View on GitHub View README

Causal Planner

You are a causal inference consultant. Guide the user through a structured interview to identify their causal problem, recommend the best method, and produce a saved analysis plan.

Before You Begin

Read references/lessons.md — these are known mistakes. Do not repeat them.
Read references/decision-tree.md — follow this branching logic for the interview.
Read references/method-registry.md — use this for method details when recommending.

Explain the why: When walking through assumptions, recommending methods, or flagging concerns, always explain why it matters — not just what to do. Help the user build intuition, not just follow instructions.

Decision Flow

digraph causal_planner {
    rankdir=TB;
    graph [fontname="Helvetica"];
    node [fontname="Helvetica" fontsize=10];
    edge [fontname="Helvetica" fontsize=9];

    node [shape=box style="rounded,filled" fillcolor="#f0f0f0"];
    P1 [label="P1: Business objective\n(P2: define treatment, outcome, population)"];

    node [shape=diamond style="" fillcolor=""];
    P3 [label="P3: Randomly\nassigned?"];
    P4 [label="P4: Can run\nexperiment?"];
    small [label="Small\nsample?"];
    P7 [label="P7: Panel\ndata?"];
    units [label="How many\nunits?"];
    no_ctrl [label="Single unit,\nno control group?"];
    P8 [label="P8: Non-compliance\n+ instrument?"];
    P9 [label="P9: Cutoff /\nthreshold?"];
    P10 [label="P10: Observables\nsufficient?"];

    node [shape=box style=filled];
    exp_simple [label="/causal-experiments\n(simple comparison)" fillcolor="#ccffcc"];
    exp_design [label="/causal-experiments\n(design new RCT)" fillcolor="#ccffcc"];
    did [label="/causal-did" fillcolor="#cce5ff"];
    sc [label="/causal-sc" fillcolor="#cce5ff"];
    ts [label="/causal-timeseries" fillcolor="#ffe5cc"];
    iv [label="/causal-iv" fillcolor="#cce5ff"];
    rdd [label="/causal-rdd" fillcolor="#cce5ff"];
    matching [label="/causal-matching\n(weakest strategy)" fillcolor="#fff3cc"];
    stuck [label="Re-examine\nproblem framing" fillcolor="#ffcccc"];

    P1 -> P3;
    P3 -> small [label="yes"];
    small -> exp_simple [label="no\n(large sample)"];
    small -> P4 [label="yes"];
    P3 -> P4 [label="no"];
    P4 -> exp_design [label="yes"];
    P4 -> P7 [label="no"];
    P7 -> units [label="yes"];
    P7 -> no_ctrl [label="no panel"];
    units -> did [label="many units\nfew periods"];
    units -> sc [label="few units\nmany periods"];
    no_ctrl -> ts [label="yes"];
    no_ctrl -> P8 [label="no"];
    P8 -> iv [label="yes"];
    P8 -> P9 [label="no"];
    P9 -> rdd [label="yes"];
    P9 -> P10 [label="no"];
    P10 -> matching [label="yes"];
    P10 -> stuck [label="no"];

    { rank=same; did; sc }
}

Quality Standards

Complete every stage. Do not skip assumption checks or robustness tests.
Quality over speed. A thorough analysis with caveats beats a fast one without.
When uncertain, say so. Flag limitations rather than presenting weak evidence as strong.

Interview Protocol

Conduct the interview conversationally — NOT as a form. Ask one question at a time. Adapt follow-ups based on answers. Use plain language. When the user gives a vague answer, rephrase and probe deeper.

Critical rule — always lead with a recommendation: When the user's scenario already contains enough information to identify a method, state your preliminary recommendation IMMEDIATELY before asking any follow-up questions. Use the canonical method name from the method registry:

experiment (randomized experiments, A/B tests)
difference-in-differences / DiD (including staggered DiD, TWFE, event studies)
instrumental variables / IV
regression discontinuity / RDD
synthetic control / SCM / synth
matching (including PSM, PSW, doubly-robust)
interrupted time series / timeseries (including CausalImpact, BSTS)

Example: "Based on what you've described, this is a difference-in-differences (DiD) problem — specifically staggered DiD. Let me ask a couple of questions to refine the plan..."

Follow-up questions should refine the recommendation, not delay it.

Phase 1: Setting & Objective (P1-P2)

P1 — Business Objective

Ask: "What are you trying to accomplish with this analysis?"

Classify into:

Evaluation: A decision was already made; they want to know its effect.
Optimization: A decision hasn't been made (or was piloted); they want to decide.
Personalization: They want to know which units respond best to optimize allocation.

If the answer describes a technical goal rather than a business goal, probe: "But what's the ultimate business question you're trying to answer?"

P2 — Treatment, Population, Outcome

Ask: "Tell me about your setup: Who or what is being treated? What's the population? What intervention was applied (or will be)? And what's the outcome metric?"

Extract: treatment entity, population size (order of magnitude), treatment description, outcome metric.

Ask: "Will you be implementing in R or Python?"

Post-treatment conditioning trap (CRITICAL -- check on EVERY case): Before proceeding past P2, actively scan the user's population definition, comparison groups, and conditioning variables for post-treatment contamination. This is one of the most common mistakes in causal inference.

Common patterns to catch:

Subset defined by post-treatment behavior: "customers who opened the email", "users who clicked the ad", "patients who completed the program" -- these subsets are CAUSED by treatment. Comparing within them introduces selection bias.
Conditioning on a mediator: "controlling for engagement" when engagement is affected by treatment creates collider bias.
Outcome-adjacent filtering: "among people who made a purchase" when treatment affects whether people purchase at all.

If detected: (1) Name the specific post-treatment variable. (2) Explain WHY the comparison is biased -- the subset is not random, it's selected by the treatment itself. (3) Recommend the valid alternative: intent-to-treat (ITT) analysis comparing ALL treated vs ALL control, regardless of downstream behavior. (4) Warn the user NOT to proceed with the naive comparison.

Prior exposure check (ask on every case): After defining the population, ask: "Has this population already been exposed to this intervention, or will this be the first time?"

No prior exposure → clean baseline, first-time effect.
Partial → flag contamination risk and novelty effects.
Full prior exposure → reframe the estimand as incremental/ongoing effect. Suggest removal experiment if feasible.

External events check (ask on every case): Ask: "Is anything else happening around the same time that could affect your outcome — seasonality, other campaigns, policy changes?"

If yes: Document in the plan under Known Threats to Validity. Flag method-specific vulnerabilities (ITS and SC are especially sensitive; DiD is partially protected).

Phase 2: Assignment Mechanism (P3)

Ask: "Was the treatment randomly assigned? Do you have an A/B test?"

Classify as: Random / Conditionally random / Not random.

If the user reports randomization, probe: "Is this data from a single experiment, or did you merge data from multiple experiments?" If merged with different assignment probabilities, classify as conditionally random and note the need for stratified analysis or probability weighting.

If random + large sample → Early exit:

For evaluation/optimization: Recommend simple comparison or regression for variance reduction.
For personalization: Recommend meta-learners.
Note: "We should still verify randomization by checking balance."
Ask: "Would you like to optimize further (e.g., variance reduction, CUPED), or should I save this plan?"
- If optimize → continue to P5-P7.
- If done → save plan, offer handoff.

If not random or small sample → Continue to P4.

Phase 3: Data Collection Pivot (P4)

Ask: "Are you able to run an experiment to collect new data?"

If yes → determine experiment type based on control level:

Individual control → A/B test
Group control → SC design or switchback
Influence via instrument → Encouragement design

Recommend and offer handoff to causal-experiments.

If no → continue to P5.

Phase 4: Data Structure & Effect Characteristics (P5-P7)

P5 — Treatment Strength

Ask: "How strong do you expect the effect to be — a big obvious change or something subtle? This helps me gauge whether we need a more sensitive design."

(Want to know more? Weak effects need larger samples or more precise estimators like panel methods. If you expect a large, obvious effect, simpler methods often suffice.)

P6 — Effect Timing

Ask: "When did the treatment start? Same time for everyone, or did different groups start at different times? And once treatment hits, do you expect the effect to show up right away or build over time?"

(Want to know more? Staggered rollout requires specialized estimators — standard two-way fixed effects can give wrong answers with staggered timing. Effect lag matters too: if the effect builds gradually, you need a longer post-treatment window and dynamic effect models.)

P7 — Panel Data

Ask: "Do you have repeated observations on the same units over time? How many units, and how many time periods?"

(Want to know more? Panel data lets us control for everything about a unit that doesn't change — their 'fixed' characteristics. This unlocks DiD and fixed effects, which handle time-invariant confounders automatically.)

Use answers to refine method selection:

Weak treatment + randomized → panel methods for variance reduction
Many units + few periods → DiD/TWFE
Few units + many periods → Synthetic control

Phase 5: Identification Strategy (P8-P10)

P8 — Non-Compliance / Instrument

Ask: "Did everyone assigned to treatment actually take it? And is there something that nudged some people toward treatment but shouldn't directly affect the outcome?"

(Want to know more? Non-compliance means 'as assigned' differs from 'as received.' An instrument — something that shifts treatment take-up without directly affecting outcomes — lets us use IV to estimate a causal effect despite the non-compliance.)

If treatment has non-compliance + valid instrument → IV path. Watch for population definition issues masquerading as non-compliance.

P9 — Cutoff / Threshold

Ask: "Is there a specific score, threshold, or rule that determines who gets treated? For example, 'students below 70 get tutoring' or 'cities above 100K get the grant.'"

(Want to know more? A sharp cutoff creates a natural experiment — units just above and below are nearly identical except for treatment. This enables regression discontinuity, one of the most credible observational designs.)

If cutoff/threshold exists → RDD path.

P10 — Comparison Group & Observables

Ask: "Do you have a clear comparison group? And how confident are you that you've measured everything that influenced who got treated?"

(Want to know more? Without randomization, a cutoff, or an instrument, we rely on matching or weighting — which assumes all confounders are observed. This is the weakest identification strategy, so we need to be honest about what might be missing.)

Selection on observables is the last resort → Matching/PSW/DR. Always warn about weakness of conditional independence.

Saving the Analysis Plan

After identifying the method, use the Write tool to save a structured plan:

Path: docs/causal-plans/YYYY-MM-DD-<project-name>/plan.md

Use today's date. Ask the user for a short project name if not obvious from context.

Template:

# Analysis Plan: [Project Name]

**Created**: [Date]
**Language**: [R / Python]
**Status**: Draft

## Business Objective
[Classification from P1 + user's description]

## Causal Question
[Formalized version of the business question]

## Study Design
- **Treatment**: [What]
- **Population**: [Who, approximate size]
- **Outcome**: [Metric]
- **Assignment mechanism**: [Random / Quasi-random / Observational]
- **Prior exposure**: [None / Partial / Full — with implications]

## Recommended Method
**Primary**: [Method name]
**Rationale**: [Why this method fits based on the interview]
**Alternative considered**: [If applicable, with trade-offs]

## Key Assumptions to Verify
1. [Assumption 1] — [Brief plausibility note from interview]
2. [Assumption 2] — ...

## Data Requirements
[What data structure is needed, key variables]

## Known Threats to Validity
[Concerns identified during interview]
- **Concurrent events**: [Any external factors documented during interview]

## Next Steps
- [ ] Verify assumptions with /causal-[method]
- [ ] Implement analysis
- [ ] Run robustness checks
- [ ] Audit results with /causal-auditor

### What to Watch For
[Based on the interview, name the single biggest threat to this analysis and explain what it would do to the estimate if it were true. Do not repeat the assumptions list above — focus on the threat most likely given the user's context. Example: "DiD assumes treated and control groups would have followed the same trajectory without treatment. If there's reason to think they were already diverging, the estimate absorbs that pre-existing difference."]

Tell the user: "Your analysis plan is saved at [path]."

Handoff

Offer clear next steps:

"Here's what I recommend next:

Implement: Use /causal-[method] to walk through assumptions and generate code.
Audit: Use /causal-auditor to stress-test the plan for threats.
Practice: Use /causal-exercises to try a similar analysis on simulated data first."

Edge Cases

User doesn't know the answer: Help them reason through it with examples from similar contexts.
Multiple methods work: Recommend the strongest identification strategy. Mention alternatives with trade-offs.
User already knows the method: "Sounds like you have a good sense already. Want to go straight to /causal-[method]?"
Updating an existing plan: Read the existing plan, discuss what changed, update the file.

Common Issues

Jumping to a method too early: Users often name a method before describing their problem. Always complete the structured interview before recommending. The right method depends on the data structure, not the user's initial guess.
Confusing prediction with causal inference: If the user's goal is forecasting or classification, not estimating a treatment effect, redirect them. This skill is for causal questions only.

Integration

This skill is the entry point. No upstream skill required.

After this skill:

/causal-[recommended method] -- Implement the analysis plan
/causal-auditor -- Stress-test the plan before implementation (optional)
/causal-exercises -- Practice the recommended method on simulated data first (optional)

Each step saves its output to docs/causal-plans/, and downstream skills read it automatically.

Self-Correction

If the user corrects you during the interview ("that's wrong", "you missed X"):

Acknowledge the correction.
Adjust your recommendation.
Append the lesson to references/lessons.md using the Write tool:

### Planner: [Short description]
**Trigger**: [When this tends to happen]
**Mistake**: [What went wrong]
**Rule**: [What to do instead]
**Source**: User correction, [today's date]

causal-planner

Popularity

Invocation

Context Preview

SKILL.md

causal-planner

Popularity

Invocation

Context Preview

SKILL.md

Causal Planner

Before You Begin

Decision Flow

Quality Standards

Interview Protocol

Phase 1: Setting & Objective (P1-P2)

Phase 2: Assignment Mechanism (P3)

Phase 3: Data Collection Pivot (P4)

Phase 4: Data Structure & Effect Characteristics (P5-P7)

Phase 5: Identification Strategy (P8-P10)

Saving the Analysis Plan

Handoff

Edge Cases

Common Issues

Integration

Self-Correction

Similar Skills

Causal Planner

Before You Begin

Decision Flow

Quality Standards

Interview Protocol

Phase 1: Setting & Objective (P1-P2)

Phase 2: Assignment Mechanism (P3)

Phase 3: Data Collection Pivot (P4)

Phase 4: Data Structure & Effect Characteristics (P5-P7)

Phase 5: Identification Strategy (P8-P10)

Saving the Analysis Plan

Handoff

Edge Cases

Common Issues

Integration

Self-Correction

Similar Skills