Master context engineering for AI features - the skill that separates AI products that work from ones that hallucinate. Use when speccing new AI features, diagnosing underperforming AI features, or doing quality checks before shipping. Helps PMs define what context AI needs, where to get it, and what to do when it fails. Based on the 4D Context Canvas framework.
Applies the 4D Context Canvas framework to spec, diagnose, and validate AI features before they fail in production.
/plugin marketplace add breethomas/pm-thought-partner/plugin install pm-thought-partner@pm-thought-partnerThis skill inherits all available tools. When active, it can use any tool Claude has access to.
references/context-layers.mdreferences/orchestrator-template.mdContext engineering is the art of giving AI exactly the right information to do its job.
Models are commodities—your context is your moat.
Most AI features fail before they reach the model. They fail because:
This skill prevents those failures.
Teams spend 90% of their time on model selection and prompts. But 90% of AI quality comes from context quality.
When AI fails, teams blame the model. But the real causes:
Fix the context, fix the AI.
Context engineering is NOT an engineering problem. It sits at the intersection of product strategy, user understanding, and system design.
PMs own three critical layers:
Defining "intelligence" - What should the AI know? What's essential vs nice-to-have? What level of personalization without feeling creepy?
Mapping context requirements to user value - Translating "users want better suggestions" into "system needs access to past rejections, current workspace state, and team preferences"
Designing degradation strategy - When context is missing, stale, or incomplete: Block the feature? Show partial answer? Ask clarifying questions? Fall back to non-personalized?
Engineers own the implementation: Retrieval architecture, vector databases, embedding pipelines, API integrations, performance optimization.
But they need you to define the what and why before they can build the how.
When this skill is invoked, start with:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CONTEXT ENGINEERING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What are you working on?
1. Spec a new AI feature
→ Define what context it needs before engineering starts
2. Diagnose an existing AI feature
→ Figure out why it's underperforming or hallucinating
3. Quick quality check
→ Validate context before shipping or during review
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Route to the appropriate path based on user selection.
Walk through four dimensions that determine whether an AI feature ships successfully or dies in production. Use BEFORE engineering starts.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SPEC NEW FEATURE — 4D Context Canvas
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
We'll walk through 4 dimensions. Most AI features fail before
they reach the model—this prevents that.
How do you want to start?
1. From a Linear issue (I'll pull the details)
2. Describe it manually
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
If Linear: Use Linear MCP to pull issue details. Pre-populate what's available.
If Manual: Ask user to describe the AI feature in 1-2 sentences.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
D1: DEMAND — What's the model's job?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 If you can't articulate the job precisely, the model can't do it.
"Make it smart" is not a spec. Neither is "personalized."
Questions to ask:
What should the model produce?
For whom? (User segment, role, context)
Under what assumptions? (What must be true for this to work?)
What constraints apply? (Tone, format, length, boundaries, prohibited content)
What defines success? (Measurable outcome, not "users like it")
The transformation to model:
VAGUE: "Draft a status update"
PRECISE: "Summarize the key changes in project X since the last report,
structured for stakeholder Y, using the user's preferred tone,
adhering to the product's reporting format, in under 200 words."
Education moment:
💡 PM vs Engineer: You own the what and why. Engineers own the how.
Without this spec, they build impressive systems that feel hollow.
Capture and display D1 summary before moving on.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
D2: DATA — What context does the model need?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 Every piece of context costs tokens. More tokens = higher cost +
slower responses. Include only what's essential for the job.
Build a Context Requirements Table together:
For each piece of context needed:
Example output:
| Data Needed | Source | Availability | Sensitivity |
|---------------------|-------------|--------------|-------------|
| User equity estimate | Internal DB | Always | PII |
| Browsing history | Analytics | Always | Internal |
| Stated goals | User input | Sometimes | Internal |
| Local market trends | API | Always | Public |
Flag problems immediately:
Education moment:
💡 Hidden dependencies live here. When you map honestly, you discover
critical data that doesn't exist, sources that are unreliable, or
assumptions that will break at scale.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
D3: DISCOVERY — How will you get the context at runtime?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 Knowing what data you need ≠ knowing how to get it at runtime.
This is where "it worked in the demo" dies in production.
For each piece of context from D2:
How will the system fetch this?
What's the latency budget?
What if the source is slow or unavailable?
Discovery strategies to consider:
Education moment:
💡 Trade-off: Real-time = fresh but slow. Cached = fast but stale.
Know which context needs which strategy.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
D4: DEFENSE — What happens when it fails?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 AI will fail. Context will be missing. Data will be stale. The
model will hallucinate confidently. Design for failure first.
Four defense mechanisms to define:
1. Pre-Checks (before calling model):
2. Post-Checks (after generation):
3. Fallback Paths (when things break):
4. Feedback Loops (how to improve):
Education moment:
💡 The best AI features degrade gracefully. Users trust systems
that know their limits.
After completing all four dimensions, generate summary:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
4D CONTEXT CANVAS COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Feature: [Name]
D1 Demand: [CLEAR / NEEDS WORK / BLOCKED]
D2 Data: [CLEAR / NEEDS WORK / BLOCKED]
D3 Discovery: [CLEAR / NEEDS WORK / BLOCKED]
D4 Defense: [CLEAR / NEEDS WORK / BLOCKED]
Overall: [READY FOR ENGINEERING / NEEDS WORK / BLOCKED]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
If blocked or needs work, list specific items to resolve.
Output options:
Figure out why an existing AI feature is underperforming, hallucinating, or behaving inconsistently. Work backwards from symptoms to root cause.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
DIAGNOSE EXISTING FEATURE — Context Audit
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 When AI features fail, teams blame the model. But 90% of failures
are context failures—wrong data, missing data, stale data, or
poorly structured data.
Let's find the root cause.
How do you want to start?
1. From a Linear issue (I'll pull the details)
2. Describe the feature and symptoms manually
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ TOKEN MANAGEMENT (for Claude):
When pulling from Linear, use get_issue for a single issue ID—don't search
broadly. If searching, always use limit: 10 and get titles first before
fetching full details.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SCOPE — What are we diagnosing?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 AI features often span multiple issues—a parent spec plus
implementation tasks and bug reports. Diagnosing without the
full picture leads to incomplete answers.
What's the scope?
1. Single issue — One specific problem to diagnose
2. Entire feature — A feature that spans multiple issues
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
If "Entire feature":
Ask for parent/overview issue ID, then use Linear MCP to find related issues.
⚠️ IMPLEMENTATION NOTE FOR CLAUDE: Linear queries can return massive amounts of data that exceed token limits. ALWAYS follow this pattern:
list_issues with limit: 20 maxget_issue on specifically selected issuesNEVER try to read all issue details in one query. This will fail.
Found 12 related issues:
• 3 sub-issues
• 2 blocked-by relations
• 4 bugs referencing this feature
• 3 other relations
⚠️ Loading all of them may be slow and increase cost.
How do you want to proceed?
1. Smart summary — Pull titles + key details, summarize each
(faster, cheaper, usually sufficient)
2. Full context — Pull everything including comments
(slower, more expensive, use for deep dives)
3. Let me pick — Show me the list, I'll select what's relevant
Education moment:
💡 This is context engineering in action—we're deciding what's
relevant vs. what's noise. Same trade-off you'll make for
your AI features.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SYMPTOMS — What's going wrong?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What are you seeing? (Select all that apply)
□ Hallucinations — Confidently wrong facts, made-up data
□ Inconsistency — Different outputs for similar inputs
□ Generic outputs — Feels like it doesn't know the user/context
□ Wrong tone/format — Output doesn't match expectations
□ Slow responses — Taking too long
□ High costs — Token usage is out of control
□ Works in demo, fails in prod — Different behavior in real conditions
□ Other: ___
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Symptom-to-cause mapping:
| Symptom | Likely Root Cause | Focus Area |
|---|---|---|
| Hallucinations | Missing domain context, no grounding | D2, D4 |
| Inconsistency | Vague job definition, missing rules | D1, D4 |
| Generic outputs | Missing user/environment context | D2 |
| Wrong tone/format | Missing constraints, no examples | D1, D4 |
| Slow responses | Too much context, bad discovery | D2, D3 |
| High costs | Dumping everything in prompt | D2, D3 |
| Demo vs prod | Discovery strategy broken | D3, D4 |
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AUDIT D1: Was the model's job clearly defined?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 Vague jobs cause vague outputs. "Make it personalized" is not
a spec—it's a wish.
Diagnostic questions:
Can you articulate exactly what the model should produce?
Is there a written spec for inputs, outputs, constraints, success criteria?
Do engineers and PMs agree on what "good" looks like?
D1 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AUDIT D2: Is the model getting the right context?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 Most hallucinations are context failures, not model failures.
The model can only reason about what it sees.
Diagnostic questions:
What context is the model actually receiving today?
Walk through the 6 layers—what's present vs missing?
| Layer | What It Is | Present? |
|---|---|---|
| Intent | What user actually wants (not just what they typed) | ? |
| User | Preferences, patterns, history, proficiency | ? |
| Domain | Entities, rules, relationships, definitions | ? |
| Rules | Constraints, policies, formats, permissions | ? |
| Environment | Current state, time, location, recent actions | ? |
| Exposition | Structured, labeled, clean final payload | ? |
Is context structured or dumped as raw text?
Is there too much context? (Token bloat)
Education moment:
💡 Common failure: Teams dump everything into the prompt hoping the
model will "figure it out." It won't. Curate ruthlessly.
D2 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AUDIT D3: Is context being fetched reliably at runtime?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 This is where "it worked in the demo" dies. Demo uses hardcoded
context. Production must fetch it live—and things break.
Diagnostic questions:
How is each piece of context being fetched?
What happens when a data source is unavailable?
Is there visibility into what context is being used per request?
D3 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AUDIT D4: Are failures being caught and handled?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 AI will fail. The question is whether users see raw failures or
graceful degradation. Trust comes from knowing your limits.
Diagnostic questions:
Are there pre-checks before calling the model?
Are there post-checks validating output?
What's the fallback UX when things break?
Is there a feedback loop capturing failures?
D4 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CONTEXT AUDIT COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Feature: [Name]
Symptoms: [What was reported]
D1 Demand: [CLEAR / GAP / CRITICAL]
D2 Data: [CLEAR / GAP / CRITICAL]
D3 Discovery: [CLEAR / GAP / CRITICAL]
D4 Defense: [CLEAR / GAP / CRITICAL]
Primary Issue: [e.g., "Missing user context (D2) + no fallback (D4)"]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
RECOMMENDED FIXES (prioritized):
1. [Highest impact fix]
2. [Second fix]
3. [Third fix]
Quick Win: [Smallest change that would improve things]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Output options:
Fast 5-check validation of context quality. Use before shipping, during code review, or when reviewing a prompt/payload.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
QUICK QUALITY CHECK
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 All hallucinations are context failures before they're model
failures. This checklist catches problems before users do.
5 checks. 5 minutes. Use before shipping or during review.
What are you checking?
1. A prompt/context payload (paste it)
2. A feature spec (describe it)
3. A Linear issue (I'll pull it)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CHECK 1: RELEVANCE — Is everything here necessary?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 More context ≠ better. Irrelevant context confuses the model,
increases cost, and slows responses.
Relevance: [PASS / NEEDS WORK]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CHECK 2: FRESHNESS — Is the data current enough?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 Stale context = stale outputs. A model reasoning about yesterday's
data will give yesterday's answers.
Freshness: [PASS / NEEDS WORK]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CHECK 3: SUFFICIENCY — Does the model have enough to reason?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 Missing context forces the model to guess. Guessing = hallucinating.
If the model needs it to reason, it must be provided.
Sufficiency: [PASS / NEEDS WORK]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CHECK 4: STRUCTURE — Is context organized clearly?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 Dumping raw text forces the model to parse meaning. Structured,
labeled sections reduce ambiguity and improve accuracy.
Structure: [PASS / NEEDS WORK]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CHECK 5: CONSTRAINTS — Are the rules explicit?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 Prompts are suggestions. The model will eventually ignore them.
Hard rules must be enforced outside the prompt or stated as
non-negotiable constraints.
Constraints: [PASS / NEEDS WORK]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CONTEXT QUALITY CHECK COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
┌─────────────┬────────────┐
│ Check │ Result │
├─────────────┼────────────┤
│ Relevance │ ✓ PASS │
│ Freshness │ ✓ PASS │
│ Sufficiency │ ⚠ NEEDS WORK │
│ Structure │ ✓ PASS │
│ Constraints │ ⚠ NEEDS WORK │
└─────────────┴────────────┘
Overall: 3/5 PASSING — Fix issues before shipping
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
ISSUES TO FIX:
[List specific issues found with concrete recommendations]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 Pro tip: Run this check again after fixes to confirm resolution.
Output options:
When Linear MCP is available:
Pulling issues:
mcp__plugin_hb-tools_linear__get_issue to fetch issue detailsmcp__plugin_hb-tools_linear__list_issues with parent filter to find related issuesCreating output:
mcp__plugin_hb-tools_linear__create_comment to add canvas/audit as commentmcp__plugin_hb-tools_linear__create_issue to create new stories from specscontext-engineering, ai-featureBefore /context-engineering:
/four-risks - Validate the feature is worth building at allAfter /context-engineering:
/ai-cost-check - Model the unit economics/ai-health-check - Pre-launch validationThe sequence:
/four-risks)/context-engineering)/ai-cost-check)/ai-health-check)Every AI system needs these layers (bottom to top):
Source framework: 4D Context Canvas, 6 Layers of Context, C.E.O. Framework Authors: Aakash Gupta & Miqdad Jaffer (OpenAI) Publication: "The Ultimate Guide to Context Engineering for PMs" - Product Growth Newsletter, 2025
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.