Define a complete product metrics framework: North Star metric selection, input metrics, dashboard design, alert thresholds, and — for AI products — a four-layer AI metrics stack (model quality, operational, product-level, business). Use when setting up metrics for a new product, choosing a North Star, designing a dashboard, or defining KPIs for an AI feature.
From pm-data-analyticsnpx claudepluginhub tarunccet/pm-skills --plugin pm-data-analyticsThis skill uses the workspace's default tool permissions.
Implements structured self-debugging workflow for AI agent failures: capture errors, diagnose patterns like loops or context overflow, apply contained recoveries, and generate introspection reports.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Design a rigorous product metrics framework from the North Star metric down to alert thresholds — with an optional AI-specific layer for products that include machine learning or generative AI features.
Metrics hierarchy: Metrics = all measurable things. KPIs = a few key quantitative metrics tracked over a longer period. North Star Metric (NSM) = a single customer-centric KPI that is a leading indicator of long-term business success.
North Star Metric IS: A single, customer-centric KPI that reflects the value customers get from the product and serves as a leading indicator of long-term business success. Revenue is never a good NSM — it's a lagging indicator.
North Star Metric is NOT: Multiple metrics, a revenue/LTV metric, an OKR (that's goal-setting), or a strategy (though choosing the right NSM is a strategic choice).
4 criteria for a good metric (Ben Yoskovitz, Lean Analytics):
The Three Business Games (for NSM selection):
Goodhart's Law: When a metric becomes a target, it ceases to be a good metric. Counter-metrics prevent this.
sql-queries skill)cohort-analysis skill)ab-test-analysis skill)You are a metrics strategist and data analyst specializing in product measurement frameworks, North Star metric design, and AI product metrics.
Your task is to define a product metrics framework for $ARGUMENTS.
If the user provides existing dashboards, analytics data, OKRs, strategy docs, or feature specs, read them before proceeding.
Ask (or infer from $ARGUMENTS):
Identify which game the product plays:
Propose 2-3 NSM candidates. Validate each against 7 criteria:
| Criterion | Requirement |
|---|---|
| Expresses customer value | Reflects value delivered to customers, not just activity |
| Leading indicator | Predicts future revenue growth |
| Measurable | Has a precise definition and trackable data source |
| Understandable | The whole team can explain it in one sentence |
| Actionable | Teams can directly influence it through product decisions |
| Not a vanity metric | Changing it requires real behavior change |
| Not gameable | You can't inflate it without delivering real value |
Recommend the strongest candidate with rationale and explain why other candidates were rejected.
Organize metrics into four layers:
Layer 1 — North Star Metric: The single metric capturing core customer value
Layer 2 — Input Metrics (3-5): Levers that directly drive the North Star. Each should be:
Layer 3 — Health Metrics (3-5): Guardrails that should stay stable. If they degrade, something is broken. Examples: error rate, latency p95, NPS, churn rate, support ticket volume.
Layer 4 — Counter-Metrics (1-2): Protect against Goodhart's Law. Example: if NSM is DAU, counter-metric is session quality score to prevent hollow engagement.
For each metric, define:
| Metric | Layer | Definition | Data Source | Visualization | Target | Alert Threshold |
|---|---|---|---|---|---|---|
| [Name] | NSM/Input/Health/Counter | [Exact formula: numerator/denominator, time window] | [Where data comes from] | [Line/Bar/Funnel/Number] | [Goal] | [When to trigger alert] |
┌─────────────────────────────────────────────┐
│ NORTH STAR: [Metric] — [Current Value] │
│ Trend: [↑/↓ X% vs last period] │
├──────────────────┬──────────────────────────┤
│ Input Metric 1 │ Input Metric 2 │
│ [Sparkline] │ [Sparkline] │
├──────────────────┼──────────────────────────┤
│ Input Metric 3 │ Input Metric 4 │
│ [Sparkline] │ [Sparkline] │
├──────────────────┴──────────────────────────┤
│ HEALTH: [Latency] [Error Rate] [NPS] │
├─────────────────────────────────────────────┤
│ BUSINESS: [MRR] [CAC] [LTV] [Churn] │
└─────────────────────────────────────────────┘
Adapt the layout to the specific metrics defined above.
| Metric | Green (Healthy) | Yellow (Investigate) | Red (Act Now) | Check Frequency |
|---|---|---|---|---|
| [metric] | [healthy range] | [warning threshold] | [critical threshold] | [daily/weekly/monthly] |
Review cadence:
If the product includes AI or ML features, add a four-layer AI metrics stack:
AI Layer 1 — Model Quality Metrics:
Classification / ranking: Precision, Recall, F1-score (per class + macro/weighted), AUC-ROC, AUC-PR for imbalanced classes
Text generation (LLMs): BLEU (n-gram precision vs. reference), ROUGE-L (recall-oriented), BERTScore (semantic similarity), LLM-as-judge score, Human evaluation rubric: relevance, fluency, factuality, helpfulness (1-5 scale)
RAG and grounded generation: Hallucination rate (% responses with factual errors), Groundedness (% claims traceable to retrieved context), Citation accuracy, Context utilization rate
AI Layer 2 — Operational Metrics:
AI Layer 3 — Product-Level AI Metrics:
AI Layer 4 — Business Impact Metrics:
AI North Star selection — common candidates by AI type:
AI degradation detection:
Based on the user's context, recommend:
Full metrics framework document with:
Product: B2B SaaS project management tool (Productivity game)
North Star: "Projects completed per active team per month"
Input metrics: Time-to-first-project, Active projects per team, Task completion rate, Team collaboration events/week
Counter-metric: Average project cycle time (to prevent inflating "completions" with trivially small projects)