Stats

Actions

Tags

Help us improve

Share bugs, ideas, or general feedback.

ai-feature-definition | pm-ai-product-management

Skill

ai-feature-definition

From pm-ai-product-management

Write a complete AI feature specification that defines desired model behaviour, input/output examples, confidence thresholds, fallback logic, and non-functional requirements. Use when defining requirements for an AI-powered feature, writing an AI PRD, or bridging the gap between PM intent and ML team implementation.

$

npx claudepluginhub tarunccet/pm-skills --plugin pm-ai-product-management

Popularity

Parent stars

1

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/pm-ai-product-management:ai-feature-definition

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

Create a comprehensive AI feature specification that accounts for probabilistic outputs, defines desired behaviour with examples, and establishes evaluation criteria.

SKILL.md

96 lines · ~1.3k tokens

Similar Skills

ai-health-check

15

Audits pre-launch AI features across 6 dimensions—model selection, data quality, cost, monitoring, failure UX, optimization—grading readiness and blocking shipment of broken products.

ai-product-canvas

876

Guides AI/ML product decisions with a canvas covering problem, model, data, evaluation, UX, responsible AI, and monitoring. Use for AI features or LLM evaluations.

AI Feature Workflow (Eval-Driven)

24

Eval-driven development workflow for shipping AI features: write eval before prompt, measure, iterate, ship with caching, cost telemetry, model fallback, and hallucination SLI.

Stats

LanguageJavaScript

Parent stars1

MaintenanceGood

Last CommitMar 28, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

AI Feature Definition

Create a comprehensive AI feature specification that accounts for probabilistic outputs, defines desired behaviour with examples, and establishes evaluation criteria.

Context

You are writing an AI feature specification for $ARGUMENTS.

Instructions

Understand the feature context:
- What user problem does this AI feature solve?
- What are the inputs? (text, image, structured data, user history, documents)
- What are the outputs? (generated text, classification label, ranked list, structured data)
- Who are the users? What is their technical sophistication?
- What does success look like from the user's perspective?
Key difference from traditional feature specs:
- AI outputs are probabilistic, not deterministic — the same input may produce different outputs
- Specify a range of acceptable outputs, not a single correct answer
- Define quality floors (minimum acceptable), not just ideal outputs
- Failure modes are gradual degradation, not binary pass/fail
Desired AI Behaviour — input/output examples:
- Provide at least 5 example pairs: {input} → {ideal output}
- Include: happy path (typical), edge cases, difficult inputs, and known failure modes
- For each example, note: what makes this output good? What would make it unacceptable?
- Specify what the model should do when it does not know the answer (refuse, hedge, escalate)
Acceptable output range and quality criteria:
- Define a rubric: what properties must every output have? (factually accurate, concise, safe, on-topic)
- Specify hard constraints: never do X, always include Y, maximum Z words
- Define the quality floor: at or below what quality score is the output considered a failure?
- Note: document who defines "acceptable" — user research, domain expert, or business stakeholder
Confidence threshold and fallback behaviour:
- Define the confidence threshold below which the AI should not act autonomously
- Graceful degradation ladder:
  1. High confidence → show AI output directly
  2. Medium confidence → show output with a caveat or confidence indicator
  3. Low confidence → show fallback (e.g., "I'm not sure — here are some options") or escalate
  4. Failure → deterministic fallback (rule-based, search, human handoff)
- Never leave the user with a blank screen or raw error message
Human-in-the-loop requirements:
- Which decisions require human review before acting? (e.g., financial, medical, legal)
- How does the user correct or override the AI output?
- How are corrections fed back to improve the model?
- Define review queue design if human review is required at scale
Explainability requirements:
- Does the user need to understand why the AI produced this output?
- Legal or regulatory explainability requirements (GDPR, EU AI Act)
- Define the explanation format: plain language reason, feature importance, source citations
- Specify for which user segments explanations are required vs. optional
Feedback loop design:
- Implicit signals: scroll depth, time-on-page, copy-to-clipboard, follow-up actions
- Explicit signals: thumbs up/down, star rating, correction/edit, flag as incorrect
- How is feedback stored, labelled, and used for model retraining?
- Define feedback freshness requirements (how stale is too stale for training?)
Model drift monitoring requirements:
- Which distribution shifts should trigger a quality review? (input distribution, output distribution)
- Define alert thresholds: hallucination rate > X%, override rate > Y%, latency p95 > Z ms
- Specify retraining trigger conditions (schedule-based, drift-triggered, or quality-triggered)
- Document who owns the monitoring dashboard and review cadence
Non-functional requirements:
- Latency SLO: p50 ≤ ___ ms, p99 ≤ ___ ms (end-to-end including UI render)
- Accuracy floor: minimum acceptable score on offline eval metric (e.g., F1 ≥ 0.85)
- Cost ceiling: maximum $ per 1,000 inferences
- Throughput: minimum requests per second at peak load
- Availability: uptime SLA % and acceptable downtime window
AI Feature Spec document structure:
- Overview: problem, users, success criteria
- Desired AI Behaviour: input/output examples table
- Acceptable output range and quality rubric
- Confidence thresholds and fallback ladder
- Human-in-the-loop design
- Explainability requirements
- Feedback loop design
- Drift monitoring requirements
- Non-functional requirements (latency, accuracy, cost)
- Open questions and dependencies

Think step by step. Save as markdown.