From bette-think
References 4D Context Canvas for engineering context in AI products. Archived; use /spec --ai for new features, /ai-debug for diagnosis, /context-check for quality.
npx claudepluginhub breethomas/bette-think --plugin bette-thinkThis skill uses the workspace's default tool permissions.
> **ARCHIVED SKILL**
Generates project specs at optimal depth: quick Linear issues for tasks, lite PRDs for features, AI specs with context, evals. Use /spec shortcuts like --quick or --ai.
Bootstraps, maintains, and evolves context networks to organize project knowledge for humans and agents. Use for new projects, scattered docs, or degrading agent effectiveness.
Guides context-driven development methodology for Conductor projects by managing structured artifacts like product.md, tech-stack.md, and workflow.md to ensure AI alignment and team consistency.
Share bugs, ideas, or general feedback.
ARCHIVED SKILL
This skill has been integrated into the unified spec system:
- New AI features: Use
/spec --aior/spec --deep context- Diagnose issues: Use
/ai-debug- Quality checks: Use
/context-checkThis file remains as a reference for the full 4D Context Canvas framework.
Context engineering is the art of giving AI exactly the right information to do its job.
Models are commoditiesβyour context is your moat.
Most AI features fail before they reach the model. They fail because:
This skill prevents those failures.
Teams spend 90% of their time on model selection and prompts. But 90% of AI quality comes from context quality.
When AI fails, teams blame the model. But the real causes:
Fix the context, fix the AI.
Context engineering is NOT an engineering problem. It sits at the intersection of product strategy, user understanding, and system design.
PMs own three critical layers:
Defining "intelligence" - What should the AI know? What's essential vs nice-to-have? What level of personalization without feeling creepy?
Mapping context requirements to user value - Translating "users want better suggestions" into "system needs access to past rejections, current workspace state, and team preferences"
Designing degradation strategy - When context is missing, stale, or incomplete: Block the feature? Show partial answer? Ask clarifying questions? Fall back to non-personalized?
Engineers own the implementation: Retrieval architecture, vector databases, embedding pipelines, API integrations, performance optimization.
But they need you to define the what and why before they can build the how.
When this skill is invoked, start with:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CONTEXT ENGINEERING
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
What are you working on?
1. Spec a new AI feature
β Define what context it needs before engineering starts
2. Diagnose an existing AI feature
β Figure out why it's underperforming or hallucinating
3. Quick quality check
β Validate context before shipping or during review
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Route to the appropriate path based on user selection.
Walk through four dimensions that determine whether an AI feature ships successfully or dies in production. Use BEFORE engineering starts.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SPEC NEW FEATURE β 4D Context Canvas
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
We'll walk through 4 dimensions. Most AI features fail before
they reach the modelβthis prevents that.
How do you want to start?
1. From a Linear issue (I'll pull the details)
2. Describe it manually
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
If Linear: Use Linear MCP to pull issue details. Pre-populate what's available.
If Manual: Ask user to describe the AI feature in 1-2 sentences.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
D1: DEMAND β What's the model's job?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ If you can't articulate the job precisely, the model can't do it.
"Make it smart" is not a spec. Neither is "personalized."
Questions to ask:
What should the model produce?
For whom? (User segment, role, context)
Under what assumptions? (What must be true for this to work?)
What constraints apply? (Tone, format, length, boundaries, prohibited content)
What defines success? (Measurable outcome, not "users like it")
The transformation to model:
VAGUE: "Draft a status update"
PRECISE: "Summarize the key changes in project X since the last report,
structured for stakeholder Y, using the user's preferred tone,
adhering to the product's reporting format, in under 200 words."
Education moment:
π‘ PM vs Engineer: You own the what and why. Engineers own the how.
Without this spec, they build impressive systems that feel hollow.
Capture and display D1 summary before moving on.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
D2: DATA β What context does the model need?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Every piece of context costs tokens. More tokens = higher cost +
slower responses. Include only what's essential for the job.
Build a Context Requirements Table together:
For each piece of context needed:
Example output:
| Data Needed | Source | Availability | Sensitivity |
|---------------------|-------------|--------------|-------------|
| User equity estimate | Internal DB | Always | PII |
| Browsing history | Analytics | Always | Internal |
| Stated goals | User input | Sometimes | Internal |
| Local market trends | API | Always | Public |
Flag problems immediately:
Education moment:
π‘ Hidden dependencies live here. When you map honestly, you discover
critical data that doesn't exist, sources that are unreliable, or
assumptions that will break at scale.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
D3: DISCOVERY β How will you get the context at runtime?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Knowing what data you need β knowing how to get it at runtime.
This is where "it worked in the demo" dies in production.
For each piece of context from D2:
How will the system fetch this?
What's the latency budget?
What if the source is slow or unavailable?
Discovery strategies to consider:
Education moment:
π‘ Trade-off: Real-time = fresh but slow. Cached = fast but stale.
Know which context needs which strategy.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
D4: DEFENSE β What happens when it fails?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ AI will fail. Context will be missing. Data will be stale. The
model will hallucinate confidently. Design for failure first.
Four defense mechanisms to define:
1. Pre-Checks (before calling model):
2. Post-Checks (after generation):
3. Fallback Paths (when things break):
4. Feedback Loops (how to improve):
Education moment:
π‘ The best AI features degrade gracefully. Users trust systems
that know their limits.
After completing all four dimensions, generate summary:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
4D CONTEXT CANVAS COMPLETE
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Feature: [Name]
D1 Demand: [CLEAR / NEEDS WORK / BLOCKED]
D2 Data: [CLEAR / NEEDS WORK / BLOCKED]
D3 Discovery: [CLEAR / NEEDS WORK / BLOCKED]
D4 Defense: [CLEAR / NEEDS WORK / BLOCKED]
Overall: [READY FOR ENGINEERING / NEEDS WORK / BLOCKED]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
If blocked or needs work, list specific items to resolve.
Output options:
Figure out why an existing AI feature is underperforming, hallucinating, or behaving inconsistently. Work backwards from symptoms to root cause.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DIAGNOSE EXISTING FEATURE β Context Audit
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ When AI features fail, teams blame the model. But 90% of failures
are context failuresβwrong data, missing data, stale data, or
poorly structured data.
Let's find the root cause.
How do you want to start?
1. From a Linear issue (I'll pull the details)
2. Describe the feature and symptoms manually
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β οΈ TOKEN MANAGEMENT (for Claude):
When pulling from Linear, use get_issue for a single issue IDβdon't search
broadly. If searching, always use limit: 10 and get titles first before
fetching full details.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SCOPE β What are we diagnosing?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ AI features often span multiple issuesβa parent spec plus
implementation tasks and bug reports. Diagnosing without the
full picture leads to incomplete answers.
What's the scope?
1. Single issue β One specific problem to diagnose
2. Entire feature β A feature that spans multiple issues
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
If "Entire feature":
Ask for parent/overview issue ID, then use Linear MCP to find related issues.
β οΈ IMPLEMENTATION NOTE FOR CLAUDE: Linear queries can return massive amounts of data that exceed token limits. ALWAYS follow this pattern:
list_issues with limit: 20 maxget_issue on specifically selected issuesNEVER try to read all issue details in one query. This will fail.
Found 12 related issues:
β’ 3 sub-issues
β’ 2 blocked-by relations
β’ 4 bugs referencing this feature
β’ 3 other relations
β οΈ Loading all of them may be slow and increase cost.
How do you want to proceed?
1. Smart summary β Pull titles + key details, summarize each
(faster, cheaper, usually sufficient)
2. Full context β Pull everything including comments
(slower, more expensive, use for deep dives)
3. Let me pick β Show me the list, I'll select what's relevant
Education moment:
π‘ This is context engineering in actionβwe're deciding what's
relevant vs. what's noise. Same trade-off you'll make for
your AI features.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
SYMPTOMS β What's going wrong?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
What are you seeing? (Select all that apply)
β‘ Hallucinations β Confidently wrong facts, made-up data
β‘ Inconsistency β Different outputs for similar inputs
β‘ Generic outputs β Feels like it doesn't know the user/context
β‘ Wrong tone/format β Output doesn't match expectations
β‘ Slow responses β Taking too long
β‘ High costs β Token usage is out of control
β‘ Works in demo, fails in prod β Different behavior in real conditions
β‘ Other: ___
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Symptom-to-cause mapping:
| Symptom | Likely Root Cause | Focus Area |
|---|---|---|
| Hallucinations | Missing domain context, no grounding | D2, D4 |
| Inconsistency | Vague job definition, missing rules | D1, D4 |
| Generic outputs | Missing user/environment context | D2 |
| Wrong tone/format | Missing constraints, no examples | D1, D4 |
| Slow responses | Too much context, bad discovery | D2, D3 |
| High costs | Dumping everything in prompt | D2, D3 |
| Demo vs prod | Discovery strategy broken | D3, D4 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AUDIT D1: Was the model's job clearly defined?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Vague jobs cause vague outputs. "Make it personalized" is not
a specβit's a wish.
Diagnostic questions:
Can you articulate exactly what the model should produce?
Is there a written spec for inputs, outputs, constraints, success criteria?
Do engineers and PMs agree on what "good" looks like?
D1 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AUDIT D2: Is the model getting the right context?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Most hallucinations are context failures, not model failures.
The model can only reason about what it sees.
Diagnostic questions:
What context is the model actually receiving today?
Walk through the 6 layersβwhat's present vs missing?
| Layer | What It Is | Present? |
|---|---|---|
| Intent | What user actually wants (not just what they typed) | ? |
| User | Preferences, patterns, history, proficiency | ? |
| Domain | Entities, rules, relationships, definitions | ? |
| Rules | Constraints, policies, formats, permissions | ? |
| Environment | Current state, time, location, recent actions | ? |
| Exposition | Structured, labeled, clean final payload | ? |
Is context structured or dumped as raw text?
Is there too much context? (Token bloat)
Education moment:
π‘ Common failure: Teams dump everything into the prompt hoping the
model will "figure it out." It won't. Curate ruthlessly.
D2 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AUDIT D3: Is context being fetched reliably at runtime?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ This is where "it worked in the demo" dies. Demo uses hardcoded
context. Production must fetch it liveβand things break.
Diagnostic questions:
How is each piece of context being fetched?
What happens when a data source is unavailable?
Is there visibility into what context is being used per request?
D3 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
AUDIT D4: Are failures being caught and handled?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ AI will fail. The question is whether users see raw failures or
graceful degradation. Trust comes from knowing your limits.
Diagnostic questions:
Are there pre-checks before calling the model?
Are there post-checks validating output?
What's the fallback UX when things break?
Is there a feedback loop capturing failures?
D4 Assessment: [CLEAR / GAP FOUND / CRITICAL GAP]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CONTEXT AUDIT COMPLETE
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Feature: [Name]
Symptoms: [What was reported]
D1 Demand: [CLEAR / GAP / CRITICAL]
D2 Data: [CLEAR / GAP / CRITICAL]
D3 Discovery: [CLEAR / GAP / CRITICAL]
D4 Defense: [CLEAR / GAP / CRITICAL]
Primary Issue: [e.g., "Missing user context (D2) + no fallback (D4)"]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
RECOMMENDED FIXES (prioritized):
1. [Highest impact fix]
2. [Second fix]
3. [Third fix]
Quick Win: [Smallest change that would improve things]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Output options:
Fast 5-check validation of context quality. Use before shipping, during code review, or when reviewing a prompt/payload.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
QUICK QUALITY CHECK
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ All hallucinations are context failures before they're model
failures. This checklist catches problems before users do.
5 checks. 5 minutes. Use before shipping or during review.
What are you checking?
1. A prompt/context payload (paste it)
2. A feature spec (describe it)
3. A Linear issue (I'll pull it)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CHECK 1: RELEVANCE β Is everything here necessary?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ More context β better. Irrelevant context confuses the model,
increases cost, and slows responses.
Relevance: [PASS / NEEDS WORK]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CHECK 2: FRESHNESS β Is the data current enough?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Stale context = stale outputs. A model reasoning about yesterday's
data will give yesterday's answers.
Freshness: [PASS / NEEDS WORK]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CHECK 3: SUFFICIENCY β Does the model have enough to reason?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Missing context forces the model to guess. Guessing = hallucinating.
If the model needs it to reason, it must be provided.
Sufficiency: [PASS / NEEDS WORK]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CHECK 4: STRUCTURE β Is context organized clearly?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Dumping raw text forces the model to parse meaning. Structured,
labeled sections reduce ambiguity and improve accuracy.
Structure: [PASS / NEEDS WORK]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CHECK 5: CONSTRAINTS β Are the rules explicit?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Prompts are suggestions. The model will eventually ignore them.
Hard rules must be enforced outside the prompt or stated as
non-negotiable constraints.
Constraints: [PASS / NEEDS WORK]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CONTEXT QUALITY CHECK COMPLETE
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββ¬βββββββββββββ
β Check β Result β
βββββββββββββββΌβββββββββββββ€
β Relevance β β PASS β
β Freshness β β PASS β
β Sufficiency β β NEEDS WORK β
β Structure β β PASS β
β Constraints β β NEEDS WORK β
βββββββββββββββ΄βββββββββββββ
Overall: 3/5 PASSING β Fix issues before shipping
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ISSUES TO FIX:
[List specific issues found with concrete recommendations]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Pro tip: Run this check again after fixes to confirm resolution.
Output options:
When Linear MCP is available:
Pulling issues:
mcp__plugin_hb-tools_linear__get_issue to fetch issue detailsmcp__plugin_hb-tools_linear__list_issues with parent filter to find related issuesCreating output:
mcp__plugin_hb-tools_linear__create_comment to add canvas/audit as commentmcp__plugin_hb-tools_linear__create_issue to create new stories from specscontext-engineering, ai-featureBefore /context-engineering:
/four-risks - Validate the feature is worth building at allAfter /context-engineering:
/ai-cost-check - Model the unit economics/ai-health-check - Pre-launch validationThe sequence:
/four-risks)/context-engineering)/ai-cost-check)/ai-health-check)Every AI system needs these layers (bottom to top):
Source framework: 4D Context Canvas, 6 Layers of Context, C.E.O. Framework Authors: Aakash Gupta & Miqdad Jaffer (OpenAI) Publication: "The Ultimate Guide to Context Engineering for PMs" - Product Growth Newsletter, 2025