From nw
Defines measurable outcome KPIs for user stories and features using Gothelf/Seiden, Maurya, and Doerr frameworks with templates for who/does-what/by-how-much. Used in requirements crafting phase.
npx claudepluginhub nwave-ai/nwave --plugin nwThis skill uses the workspace's default tool permissions.
"Doing stuff isn't the point. Achieving stuff is." -- Jeff Gothelf
Designs OKR trees, KPI frameworks, North Star metrics, leading/lagging indicators, and A/B experiment guardrails for team goals and measurement.
Builds tailored metrics frameworks for products or businesses, from North Star metric and metric tree to counter-metrics and dashboards. Use for KPI trees, AARRR, HEART, or OKR requests.
Designs OKRs with North Star metric, input metrics tree, and cadence. Activates on 'set OKRs', 'define objectives', 'build metrics tree', or North Star queries.
Share bugs, ideas, or general feedback.
"Doing stuff isn't the point. Achieving stuff is." -- Jeff Gothelf
Defines measurable outcome KPIs for user stories and features. Loaded during Phase 4 (Requirements Crafting) to produce outcome-kpis.md. Synthesizes three frameworks: customer-centric OKRs, lean metrics, and OKR methodology.
Primary template from Gothelf/Seiden. Every KPI answers five questions:
| Component | Question | Example |
|---|---|---|
| Who | Which user segment? | Returning customers with 2+ orders |
| Does What | What observable behavior changes? | Complete checkout without contacting support |
| By How Much | What is the measurable target? | 40% reduction in support tickets |
| Measured By | How do we collect the data? | Support ticket system + checkout analytics |
| Timeframe | When do we measure? | 30 days post-release, then weekly |
Formula: [Who] [Does what] [By how much]
Apply as litmus test: if a KPI cannot answer all five components, it measures an output (feature delivery), not an outcome (behavior change).
| Bad (Output) | Good (Outcome) |
|---|---|
| Launch mobile app v2 | Mobile users complete purchases 40% more often |
| Build recommendation engine | Users purchase from recommendations, increasing from 10% to 25% |
| Deploy onboarding redesign | New users complete onboarding within 24 hours 30% more often |
| Ship CSV export | Analysts resolve data questions without engineering support 60% of the time |
From Gothelf/Seiden: business results are lagging -- teams cannot directly influence them. Target leading indicators instead.
| Type | Definition | Examples | Actionable? |
|---|---|---|---|
| Lagging (Impact) | Business results already happened | Revenue, NPS, market share, churn rate | No -- too slow, too many variables |
| Leading (Outcome) | Behavior changes predicting business results | Purchase completion rate, feature adoption, retention | Yes -- teams can run experiments |
| Leading (Secondary) | Behaviors predicting primary leading indicators | Page visits, trial starts, onboarding steps completed | Yes -- most granular, fastest signal |
Map every KPI through this chain to ensure traceability:
Business KPI (Lagging/Impact)
Example: "Increase quarterly revenue by 15%"
|
v
Customer Behavior (Leading/Outcome)
+-- Users complete purchases from recommendations (+25%)
+-- Users return within 7 days (+20%)
|
v
Secondary Behavior (Leading/Secondary)
+-- Users browse recommendation pages (+30%)
+-- Users enable push notifications (+15%)
Each layer decomposes into more granular behavioral metrics. Teams target the highest-leverage behavior.
From Maurya (Running Lean): actionable metrics "tie specific and repeatable actions to observed results."
| Dimension | Vanity | Actionable |
|---|---|---|
| Measures | Business size (totals) | Individual behavior (rates) |
| Data type | Gross aggregates | Ratios and unit economics |
| Cause/effect | No insight into why | Directly signal product-market fit |
| Examples | Total users, page views, downloads | Activation rate, retention cohort, churn rate |
| Decision value | Cannot inform action | Drives specific experiments |
Pick ONE metric per product stage. Optimizing one metric reveals the next.
| Stage | Focus | Example OMTM |
|---|---|---|
| Empathy | Problem validation | Interview pain intensity (qualitative) |
| Stickiness | Retention | Churn rate, DAU/MAU ratio |
| Virality | Organic growth | Viral coefficient, referral rate |
| Revenue | Monetization | Customer Lifetime Value, MRR |
| Scale | Growth efficiency | CAC/LTV ratio, payback period |
Good metric characteristics: rate or ratio (not absolute number) | comparable across time | simple enough to remember | predictive | behavior-changing.
From Maurya: model the business as a production line. Identify the bottleneck, then focus KPIs there.
| Stage | Key Question | Example Metric |
|---|---|---|
| Acquisition | Are we reaching the right people? | Visitor-to-signup conversion rate |
| Activation | Do users get the "aha moment"? | % completing core action in first session |
| Retention | Do users come back? | Week-1 return rate, DAU/MAU |
| Revenue | Do users pay? | Trial-to-paid conversion rate |
| Referral | Do users tell others? | Referral rate, viral coefficient |
Activation is causal -- it drives retention, revenue, and referral. Prioritize activation KPIs when uncertain.
From Doerr (Measure What Matters): connect KPIs to strategic objectives.
Every Key Result uses the outcome formula. Quality criteria:
| Type | Expected Score | Resource Allocation | Failure Response |
|---|---|---|---|
| Committed | 1.0 (must deliver) | Consume most available resources | Requires explanation, replanning |
| Aspirational | 0.7 (stretch goal) | Overcommit slightly beyond capacity | Expected -- carry forward |
Sweet spot: blended aggregate of 0.6-0.7. Consistently hitting 1.0 = not ambitious enough.
| Anti-Pattern | Signal | Fix |
|---|---|---|
| Output-based KRs | "Launch X", "Build Y", "Ship Z" | Rewrite as behavior: "Users [do what] [by how much]" |
| Too many KRs | >5 KRs per Objective | Cut to 2-4 per Objective, max 3-5 Objectives |
| Vague KRs | No numeric target | Add baseline + target + deadline |
| Sandbagging | Consistently scoring 1.0 | Increase ambition level |
| Backlog retrofitting | OKRs match existing backlog 1:1 | OKRs filter backlog, not justify it |
Objective (qualitative, inspirational, timeboxed)
|
Key Results (2-4 per Objective, [Who][Does what][By how much])
|
Epics (weeks of work, aligned to Key Results)
|
User Stories (days of work, with measurable acceptance criteria)
Every story traces back to a Key Result. Orphan stories (no KR link) are potential waste.
Use this exact structure in outcome-kpis.md:
## Feature: {feature-name}
### Objective
{What success looks like in one sentence -- qualitative, inspirational, timeboxed}
### Outcome KPIs
| # | Who | Does What | By How Much | Baseline | Measured By | Type |
|---|-----|-----------|-------------|----------|-------------|------|
| 1 | {segment} | {behavior} | {target} | {current} | {method} | Leading/Lagging |
### Metric Hierarchy
- **North Star**: {the ONE metric that matters most for this feature}
- **Leading Indicators**: {behaviors that predict the north star}
- **Guardrail Metrics**: {metrics that must NOT degrade}
### Measurement Plan
| KPI | Data Source | Collection Method | Frequency | Owner |
|-----|------------|-------------------|-----------|-------|
### Hypothesis
We believe that {proposed solution} for {user segment} will achieve {key result}.
We will know this is true when {who} {does what} {by how much}.
Before finalizing KPIs, verify each one passes:
| Check | Question | If No |
|---|---|---|
| Measurable today? | Can you measure it with current instrumentation? | Add instrumentation to requirements |
| Rate not total? | Is it a ratio/rate, not a gross count? | Convert to rate (vanity -> actionable) |
| Outcome not output? | Does it describe user behavior, not feature delivery? | Rewrite as "[Who] [Does what] [By how much]" |
| Has baseline? | Do you know the current value? | Establish baseline before setting target |
| Team can influence? | Can the team directly affect this metric? | Decompose into more granular leading indicator |
| Has guardrails? | Are there metrics that must not degrade? | Add guardrail metrics (e.g., error rate, load time) |
The platform-architect needs these from outcome-kpis.md to plan instrumentation:
For deeper reading on source frameworks:
docs/research/running-lean-research.mddocs/research/measure-what-matters-research.mddocs/research/who-does-what-research.md