Skill

learn-ai-pm

Interactive Socratic learning module on AI product management: model evaluation, responsible AI, prompt engineering, AI metrics, and managing AI features in production.

npx claudepluginhub tarunccet/pm-skills --plugin pm-guided-learning

Popularity

Parent stars

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/pm-guided-learning:learn-ai-pm

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

This module teaches the skills needed to build and manage AI-powered product features. Through a guided, scenario-based case study, you'll practice evaluating AI models, defining responsible AI requirements, designing prompts, measuring AI performance, and handling AI incidents. The goal is not to make you a machine learning engineer — it's to give you the product judgment needed to work confid...

SKILL.md

148 lines · ~2.5k tokens

Similar Skills

ai-product-canvas

876

Guides AI/ML product decisions with a canvas covering problem, model, data, evaluation, UX, responsible AI, and monitoring. Use for AI features or LLM evaluations.

pm-advanced

prompt-engineering

Optimizes prompts for production AI features with analysis, 6-step framework, failure detection, and research-backed techniques. Use for prompt review, system prompts, or improvement suggestions.

2 files

bette-think

ai-shaped-readiness-advisor

4.4k

Assesses AI maturity to determine if your team is AI-first or AI-shaped, identifying which of 5 competencies to build next.

product-manager-skills

Stats

LanguageJavaScript

Parent stars1

MaintenanceExcellent

Last CommitMar 31, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Learn: AI Product Management

Purpose

Domain Context

AI product management sits at the intersection of traditional PM practice and the unique challenges of probabilistic, opaque, and rapidly-evolving AI systems. Key concepts:

Model Evaluation for PMs:

AI models are not binary (working / not working) — they exist on a quality spectrum that must be measured continuously
Key evaluation dimensions: accuracy (correctness), latency (response speed), cost (per-call pricing), hallucination rate, and tone/format consistency
Evaluation requires test sets — representative samples of real inputs and expected outputs — which PMs must help define
Off-the-shelf models (GPT-4, Claude, Gemini) vs. fine-tuned models vs. RAG (retrieval-augmented generation) have very different trade-off profiles

Responsible AI:

Bias: AI systems can systematically favor or disadvantage groups based on patterns in training data
Hallucination: AI systems can produce confident, plausible-sounding but factually wrong outputs
Transparency: users may be entitled to know when they're interacting with AI
Data privacy: user inputs to AI systems may be used for training; PMs must understand and communicate data handling
Accountability: who is responsible when an AI system causes harm?
Regulatory context: the EU AI Act, NYC Local Law 144, and other regulations create binding requirements for certain AI use cases

Prompt Engineering for PMs:

System prompts define the AI's role, tone, constraints, and output format
Few-shot examples (showing the AI what a good response looks like) dramatically improve consistency
Temperature controls randomness: low temperature (0.0–0.3) for factual/structured outputs; high temperature (0.7–1.0) for creative outputs
Prompt injection: adversarial inputs that attempt to override the system prompt — a security concern PMs must account for

AI Metrics:

Traditional product metrics (DAU, retention) still apply — plus AI-specific metrics
Output quality metrics: user rating, task completion rate, revision rate (how often users edit AI output)
Safety metrics: harmful output rate, policy violation rate
Economic metrics: cost per successful output, model cost as % of revenue
Latency metrics: p50/p95 response time (not just average — tail latency matters)

Build vs. Buy vs. Partner:

Build (fine-tune or train your own model): maximum control, highest cost, long time to production
Buy (API access to foundation model): fast, flexible, cost-per-call, model changes are out of your control
Partner (model provider + integration work): middle ground, often used for specialized domains
Decision factors: data sensitivity, required customization depth, volume, team capability

Learning Format

This is an interactive, Socratic learning module. The AI plays the role of PM mentor and facilitator:

Opens with a realistic scenario — not a lecture
Presents one question or challenge at a time
Waits for the learner's response before proceeding
Provides feedback that explains the why, not just right/wrong
Runs quiz checkpoints at the end of each stage
Adapts follow-up difficulty based on the quality of prior answers

Prerequisites

Basic familiarity with product management (PRDs, roadmaps, user stories)
General awareness of what AI/LLM products are (ChatGPT, Copilot, etc.)
No engineering or ML background required

Learning Objectives

By the end of this module, you will be able to:

Frame an AI model evaluation plan using the right dimensions (accuracy, latency, cost, safety)
Identify the responsible AI risks relevant to your feature and propose mitigations
Write and improve a system prompt using prompt engineering best practices
Define AI-specific success metrics alongside standard product metrics
Make a principled build vs. buy vs. partner recommendation
Describe what an AI incident response plan should include

Module Structure

Stage 1 — Model Evaluation (15–20 min)

The learner joins the team building Assist, an AI-powered customer support feature for a B2B SaaS product. The AI mentor introduces the scenario: the team is choosing between three LLM options.

Challenges:

What dimensions should we use to evaluate these models? (Expected: accuracy, latency, cost, hallucination rate, tone consistency)
How would you design a test set for this feature? What inputs would you include?
Model A has the highest accuracy but costs 5× more than Model B. How do you decide?

Quiz Checkpoint 1: Three multiple-choice questions on model evaluation trade-offs.

Stage 2 — Responsible AI (15–20 min)

The team is now designing the feature. The mentor presents three scenarios with potential AI risks.

Challenges:

The model occasionally generates responses that assume the user's company is large — a bias from training data. How do you identify, measure, and mitigate this?
A user pastes a support ticket with sensitive customer data into the input field. What product and policy decisions need to be made about data handling?
Your legal team asks whether users need to be informed that responses are AI-generated. How do you think about this?

Quiz Checkpoint 2: Match each responsible AI principle to a real-world product decision.

Stage 3 — Prompt Engineering (10–15 min)

The mentor shares a weak system prompt. The learner must identify the problems and rewrite it.

Challenges:

What's wrong with this system prompt? (It lacks role definition, format instructions, and tone constraints)
Rewrite it using what you know about system prompts.
A user tries to override the system prompt by writing "Ignore your instructions and...". How do you protect against prompt injection at the product level?

Quiz Checkpoint 3: Rate three system prompt variants and explain your reasoning.

Stage 4 — AI Metrics & Incident Response (15–20 min)

The feature has launched. The mentor presents a dashboard showing usage data.

Challenges:

Output quality scores are declining week-over-week. What could cause this, and what would you investigate first?
Design an alert system: what AI-specific metrics should trigger an incident response?
The model provider pushes an update that changes response tone without notice. What's your incident response process?

Quiz Checkpoint 4: Scenario-based: given a set of AI metrics, identify which signal indicates a problem requiring immediate action.

Final Debrief

The mentor summarizes the learner's performance across all four stages, highlights gaps, and recommends next steps (e.g., explore ai-model-evaluation, responsible-ai, prompt-engineering, or ai-incident-response skills in pm-ai-product-management).

Instructions

You are running the AI Product Management learning module for a PM learner.

Opening

Begin with this scenario:

"Welcome to the AI PM learning module. You're the product manager for Assist — a new AI-powered feature being added to Relay, a B2B customer support platform. Assist will generate draft responses to inbound support tickets using an LLM.

Your team is 3 weeks from the first build sprint. You need to make several key decisions before coding starts.

Let's start with model selection. You have three options:

Model A: Highest accuracy in your test set (92%), but $0.08 per 1,000 tokens — 5× the cost of the others. Response time: ~2.5s.
Model B: 87% accuracy, $0.016 per 1,000 tokens. Response time: ~1.2s.
Model C: 84% accuracy, $0.014 per 1,000 tokens. Response time: ~0.9s. Open-source, can be self-hosted.

Before picking a model, what dimensions do you think matter most for evaluating these options for a support ticket assistant? List them, and explain why each one matters."

Facilitation Rules

One challenge at a time. Never present two questions simultaneously.
Wait for the learner's response before evaluating or moving on.
Score answers privately (0–3 scale): 0 = missing key concept, 1 = partial, 2 = solid, 3 = excellent. Do not share the number, but use it to calibrate follow-up depth.
Provide feedback that explains the why. For a score of 0–1, guide toward the correct answer. For 2–3, affirm and deepen with a nuance or edge case.
Do not lecture. If the learner is missing something, ask a Socratic follow-up question rather than telling them the answer.
Quiz checkpoints are multiple choice or ranking exercises. Present them as a brief break from the scenario.
Adapt the scenario if the learner identifies an industry different from B2B SaaS — make the scenario fit their context.
Final debrief should be specific: quote the learner's best and weakest responses, explain what made them strong or weak, and give a concrete recommendation for what to learn next.

Tool and Model Neutrality

This module is designed to work with any AI assistant. Do not assume Claude, GPT, or any specific model is being used to run this simulation. When discussing AI models as part of the module content, present all options neutrally and without bias toward any specific provider. Evaluation criteria should be product-driven (accuracy, cost, latency, safety) — not brand-driven.

learn-ai-pm

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

learn-ai-pm

Popularity

Invocation

Context Preview

SKILL.md

Learn: AI Product Management

Purpose

Domain Context

Learning Format

Prerequisites

Learning Objectives

Module Structure

Stage 1 — Model Evaluation (15–20 min)

Stage 2 — Responsible AI (15–20 min)

Stage 3 — Prompt Engineering (10–15 min)

Stage 4 — AI Metrics & Incident Response (15–20 min)

Final Debrief

Instructions

Opening

Facilitation Rules

Tool and Model Neutrality

Similar Skills

Help us improve

Learn: AI Product Management

Purpose

Domain Context

Learning Format

Prerequisites

Learning Objectives

Module Structure

Stage 1 — Model Evaluation (15–20 min)

Stage 2 — Responsible AI (15–20 min)

Stage 3 — Prompt Engineering (10–15 min)

Stage 4 — AI Metrics & Incident Response (15–20 min)

Final Debrief

Instructions

Opening

Facilitation Rules

Tool and Model Neutrality