Skill

nw-outcome-kpi-framework

From nw

Defines measurable outcome KPIs for user stories and features using Gothelf/Seiden, Maurya, and Doerr frameworks with templates for who/does-what/by-how-much. Used in requirements crafting phase.

documentation

developer-tools

npx claudepluginhub nwave-ai/nwave --plugin nw

Tool Access

This skill uses the workspace's default tool permissions.

Preview

"Doing stuff isn't the point. Achieving stuff is." -- Jeff Gothelf

SKILL.md

Similar Skills

okr-design

169

Designs OKR trees, KPI frameworks, North Star metrics, leading/lagging indicators, and A/B experiment guardrails for team goals and measurement.

8 files5 tools

ork

metrics-framework

314

Builds tailored metrics frameworks for products or businesses, from North Star metric and metric tree to counter-metrics and dashboards. Use for KPI trees, AARRR, HEART, or OKR requests.

pm-data

crest-okr

Designs OKRs with North Star metric, input metrics tree, and cadence. Activates on 'set OKRs', 'define objectives', 'build metrics tree', or North Star queries.

11 tools

tonone

Stats

Parent Repo Stars484

Parent Repo Forks49

Last CommitMar 20, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Outcome KPI Framework

"Doing stuff isn't the point. Achieving stuff is." -- Jeff Gothelf

Defines measurable outcome KPIs for user stories and features. Loaded during Phase 4 (Requirements Crafting) to produce outcome-kpis.md. Synthesizes three frameworks: customer-centric OKRs, lean metrics, and OKR methodology.

The Outcome KPI Formula

Primary template from Gothelf/Seiden. Every KPI answers five questions:

Component	Question	Example
Who	Which user segment?	Returning customers with 2+ orders
Does What	What observable behavior changes?	Complete checkout without contacting support
By How Much	What is the measurable target?	40% reduction in support tickets
Measured By	How do we collect the data?	Support ticket system + checkout analytics
Timeframe	When do we measure?	30 days post-release, then weekly

Formula: [Who] [Does what] [By how much]

Apply as litmus test: if a KPI cannot answer all five components, it measures an output (feature delivery), not an outcome (behavior change).

Good vs Bad KPIs

Bad (Output)	Good (Outcome)
Launch mobile app v2	Mobile users complete purchases 40% more often
Build recommendation engine	Users purchase from recommendations, increasing from 10% to 25%
Deploy onboarding redesign	New users complete onboarding within 24 hours 30% more often
Ship CSV export	Analysts resolve data questions without engineering support 60% of the time

Leading vs Lagging Indicators

From Gothelf/Seiden: business results are lagging -- teams cannot directly influence them. Target leading indicators instead.

Type	Definition	Examples	Actionable?
Lagging (Impact)	Business results already happened	Revenue, NPS, market share, churn rate	No -- too slow, too many variables
Leading (Outcome)	Behavior changes predicting business results	Purchase completion rate, feature adoption, retention	Yes -- teams can run experiments
Leading (Secondary)	Behaviors predicting primary leading indicators	Page visits, trial starts, onboarding steps completed	Yes -- most granular, fastest signal

Outcome Mapping Chain

Map every KPI through this chain to ensure traceability:

Business KPI (Lagging/Impact)
    Example: "Increase quarterly revenue by 15%"
        |
        v
    Customer Behavior (Leading/Outcome)
        +-- Users complete purchases from recommendations (+25%)
        +-- Users return within 7 days (+20%)
        |
        v
    Secondary Behavior (Leading/Secondary)
        +-- Users browse recommendation pages (+30%)
        +-- Users enable push notifications (+15%)

Each layer decomposes into more granular behavioral metrics. Teams target the highest-leverage behavior.

Actionable vs Vanity Metrics

From Maurya (Running Lean): actionable metrics "tie specific and repeatable actions to observed results."

Dimension	Vanity	Actionable
Measures	Business size (totals)	Individual behavior (rates)
Data type	Gross aggregates	Ratios and unit economics
Cause/effect	No insight into why	Directly signal product-market fit
Examples	Total users, page views, downloads	Activation rate, retention cohort, churn rate
Decision value	Cannot inform action	Drives specific experiments

The OMTM (One Metric That Matters)

Pick ONE metric per product stage. Optimizing one metric reveals the next.

Stage	Focus	Example OMTM
Empathy	Problem validation	Interview pain intensity (qualitative)
Stickiness	Retention	Churn rate, DAU/MAU ratio
Virality	Organic growth	Viral coefficient, referral rate
Revenue	Monetization	Customer Lifetime Value, MRR
Scale	Growth efficiency	CAC/LTV ratio, payback period

Good metric characteristics: rate or ratio (not absolute number) | comparable across time | simple enough to remember | predictive | behavior-changing.

Customer Factory (AARRR) Constraint Mapping

From Maurya: model the business as a production line. Identify the bottleneck, then focus KPIs there.

Stage	Key Question	Example Metric
Acquisition	Are we reaching the right people?	Visitor-to-signup conversion rate
Activation	Do users get the "aha moment"?	% completing core action in first session
Retention	Do users come back?	Week-1 return rate, DAU/MAU
Revenue	Do users pay?	Trial-to-paid conversion rate
Referral	Do users tell others?	Referral rate, viral coefficient

Activation is causal -- it drives retention, revenue, and referral. Prioritize activation KPIs when uncertain.

OKR Integration

From Doerr (Measure What Matters): connect KPIs to strategic objectives.

Writing Key Results

Every Key Result uses the outcome formula. Quality criteria:

Measurable: "It's not a Key Result unless it has a number" (Marissa Mayer)
Outcome-focused: "Increase email subscribers by 20%" not "Launch newsletter"
Time-bound: deadline (typically end of quarter)
Verifiable: no ambiguity about whether met
Aggressive yet realistic: stretch without demoralizing

Committed vs Aspirational

Type	Expected Score	Resource Allocation	Failure Response
Committed	1.0 (must deliver)	Consume most available resources	Requires explanation, replanning
Aspirational	0.7 (stretch goal)	Overcommit slightly beyond capacity	Expected -- carry forward

Sweet spot: blended aggregate of 0.6-0.7. Consistently hitting 1.0 = not ambitious enough.

OKR Anti-Patterns

Anti-Pattern	Signal	Fix
Output-based KRs	"Launch X", "Build Y", "Ship Z"	Rewrite as behavior: "Users [do what] [by how much]"
Too many KRs	>5 KRs per Objective	Cut to 2-4 per Objective, max 3-5 Objectives
Vague KRs	No numeric target	Add baseline + target + deadline
Sandbagging	Consistently scoring 1.0	Increase ambition level
Backlog retrofitting	OKRs match existing backlog 1:1	OKRs filter backlog, not justify it

Mapping: Objective to User Stories

Objective (qualitative, inspirational, timeboxed)
    |
    Key Results (2-4 per Objective, [Who][Does what][By how much])
    |
    Epics (weeks of work, aligned to Key Results)
    |
    User Stories (days of work, with measurable acceptance criteria)

Every story traces back to a Key Result. Orphan stories (no KR link) are potential waste.

KPI Template

Use this exact structure in outcome-kpis.md:

## Feature: {feature-name}

### Objective
{What success looks like in one sentence -- qualitative, inspirational, timeboxed}

### Outcome KPIs

| # | Who | Does What | By How Much | Baseline | Measured By | Type |
|---|-----|-----------|-------------|----------|-------------|------|
| 1 | {segment} | {behavior} | {target} | {current} | {method} | Leading/Lagging |

### Metric Hierarchy
- **North Star**: {the ONE metric that matters most for this feature}
- **Leading Indicators**: {behaviors that predict the north star}
- **Guardrail Metrics**: {metrics that must NOT degrade}

### Measurement Plan
| KPI | Data Source | Collection Method | Frequency | Owner |
|-----|------------|-------------------|-----------|-------|

### Hypothesis
We believe that {proposed solution} for {user segment} will achieve {key result}.
We will know this is true when {who} {does what} {by how much}.

KPI Granularity

Per Epic: Define 2-3 north-star KPIs that all contributing stories aim to move
Per Story: Add story-level success criteria tied to the epic-level KPIs
Guardrails: Define at epic level, apply consistently across all stories
Rule of thumb: If the feature has 1-3 stories, one KPI table suffices. If 4+, group by epic.

Smell Tests

Before finalizing KPIs, verify each one passes:

Check	Question	If No
Measurable today?	Can you measure it with current instrumentation?	Add instrumentation to requirements
Rate not total?	Is it a ratio/rate, not a gross count?	Convert to rate (vanity -> actionable)
Outcome not output?	Does it describe user behavior, not feature delivery?	Rewrite as "[Who] [Does what] [By how much]"
Has baseline?	Do you know the current value?	Establish baseline before setting target
Team can influence?	Can the team directly affect this metric?	Decompose into more granular leading indicator
Has guardrails?	Are there metrics that must not degrade?	Add guardrail metrics (e.g., error rate, load time)

Handoff to DEVOPS

The platform-architect needs these from outcome-kpis.md to plan instrumentation:

Data collection requirements: what events/behaviors to instrument, what data points to capture
Dashboard/monitoring needs: which metrics need real-time dashboards vs. weekly reports
Alerting thresholds: guardrail metric boundaries that trigger alerts when breached
Baseline measurement: any metrics needing baseline collection before feature release

References

For deeper reading on source frameworks:

Running Lean (Maurya): docs/research/running-lean-research.md
Measure What Matters (Doerr): docs/research/measure-what-matters-research.md
Who Does What By How Much (Gothelf/Seiden): docs/research/who-does-what-research.md