From bette-think
Plans v1→v2→v3 agency ladder for AI features: maps autonomy increases over time, defines promotion criteria, generates stakeholder artifacts. Uses CC/CD framework.
npx claudepluginhub breethomas/bette-think --plugin bette-thinkThis skill uses the workspace's default tool permissions.
**AI products earn autonomy. They don't start with it.**
Guides AI-native product development addressing agency-control tradeoffs, calibration loops, CCCD framework, and eval strategies for AI agents and LLM features.
Coaches refinement of vague prompts via Socratic dialogue, multiple transformation styles, and guided learning for agentic engineering. Use interactively without auto-execution.
Guides organizational AI adoption using Brian Balfour's CODER framework: diagnoses barriers, creates plans with constraints, ownership, directives, expectations, rewards.
Share bugs, ideas, or general feedback.
AI products earn autonomy. They don't start with it.
Every increase in AI agency means surrendering human control. This tradeoff must be intentional, not accidental.
Low Agency ←──────────────────→ High Agency
(Human decides) (AI decides)
High Control ←──────────────→ Low Control
(Predictable) (Unpredictable)
Key insight: You're not ready to give high agency until you've thoroughly tested how the AI behaves at lower autonomy levels.
When this skill is invoked, start with:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AGENCY LADDER
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Every AI feature should earn autonomy, not start with it.
What are you working on?
1. Planning a new AI feature
→ Build the ladder from scratch
2. Reviewing an existing AI feature
→ Map current state, plan next level
3. Deciding whether to increase agency
→ Promotion criteria check
4. Workshop/conversation tool
→ Export ladder for stakeholder discussion
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Parse intent from context:
Command-line shortcuts:
/agency-ladder → Show entry menu/agency-ladder --review → Flow 2 (existing feature)/agency-ladder --promote → Flow 3 (promotion check)/agency-ladder --export → Flow 4 (stakeholder artifact)━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 1: UNDERSTAND THE FEATURE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Before mapping agency, let's understand what the AI is doing.
• What task is the AI performing?
• Who is the user?
• What's the consequence of AI being wrong?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Questions to ask:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 2: DEFINE V1 (LOW AGENCY)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
V1 is where you start: AI suggests, human decides.
• What's the minimum capability that tests your hypothesis?
• How will users override when the AI is wrong?
• What data will you collect to inform V2?
V1 Pattern: High Control, Low Agency
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
V1 questions:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 3: MAP V2 (MEDIUM AGENCY)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
V2 happens when V1 proves reliability.
• What categories could AI handle autonomously?
• What still needs human judgment?
• What's the escalation path?
V2 Pattern: Medium Control, Medium Agency
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
V2 questions:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 4: ENVISION V3 (HIGH AGENCY)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
V3 is the end state: AI acts autonomously.
• What does full autonomy look like?
• What's the human's role at V3?
• Is V3 even desirable for this feature?
V3 Pattern: Low Control, High Agency
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
V3 questions:
Note: Some features should never reach V3. That's a valid outcome.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
STEP 5: PROMOTION CRITERIA
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What must be true to move from V(n) to V(n+1)?
□ Quality metrics stable for how long?
□ What user correction rate is acceptable?
□ What error patterns must be resolved?
□ What monitoring must be in place?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Promotion criteria categories:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AGENCY LADDER COMPLETE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
# [Feature Name] - Agency Ladder
## Progression Overview
| Version | Capability | Control | Agency | What You're Testing |
|---------|------------|---------|--------|---------------------|
| V1 | [describe] | High | Low | [hypothesis] |
| V2 | [describe] | Medium | Medium | [hypothesis] |
| V3 | [describe] | Low | High | [hypothesis] |
## Flywheel
| Version | What You're Testing | What You Learn | What Feeds Next Loop |
|---------|---------------------|----------------|----------------------|
| V1 | [hypothesis] | [insights] | [data for V2] |
| V2 | [hypothesis] | [insights] | [data for V3] |
| V3 | [hypothesis] | [insights] | [calibration] |
## Control Handoffs
**Override mechanism:** [how users correct]
**Escalation path:** [when AI defers]
**Feedback capture:** [how corrections improve system]
## Promotion Criteria (V1→V2)
- [ ] [Metric 1] stable for [duration]
- [ ] [Metric 2] above [threshold]
- [ ] [Condition 3] resolved
- [ ] [Monitoring] in place
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
What next?
1. Export to spec (/spec --ai)
2. Set up post-launch calibration (/calibrate)
3. Share with stakeholders
4. Start building V1
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
MAPPING CURRENT STATE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Let's understand where this feature is today.
• What's the AI currently doing?
• How much human oversight exists?
• What's working? What's failing?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Questions to determine current level:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GAPS ANALYSIS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Based on the current state, here are potential gaps:
□ Override mechanism exists?
□ Quality metrics being tracked?
□ Error patterns documented?
□ Promotion criteria defined?
□ Next level planned?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Generate assessment showing:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PROMOTION CHECK: V[n] → V[n+1]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Let's see if this feature is ready for more autonomy.
QUALITY METRICS
□ Accuracy/quality metrics stable for 4+ weeks?
□ No new error patterns emerging?
□ User corrections decreasing over time?
SAFETY & TRUST
□ Confident in all known failure modes?
□ Override mechanism working well?
□ User feedback positive?
OPERATIONAL READINESS
□ Monitoring in place for new agency level?
□ Rollback plan if quality degrades?
□ Team aligned on promotion decision?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
PROMOTION ASSESSMENT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Feature: [name]
Current Level: V[n]
Target Level: V[n+1]
VERDICT: [READY / NOT READY / NEEDS WORK]
✅ Passing:
- [criteria met]
- [criteria met]
❌ Blocking:
- [criteria not met]
- [criteria not met]
⚠️ Risks:
- [risk if promoted now]
RECOMMENDATION:
[Clear recommendation with reasoning]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Generate a presentation-ready artifact:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
AGENCY LADDER - [Feature Name]
Stakeholder Discussion Document
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
## Why This Matters
AI products need a fundamentally different approach than
traditional software. We can't "ship and forget."
This document outlines how [feature] will earn autonomy
over time, with clear gates at each level.
## The Ladder
[Flywheel table from Flow 1]
## Current State
[Where we are today]
## Next Steps
[What we're asking stakeholders to approve]
## Risks & Mitigations
[Key risks at each level and how we address them]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
"Let's just ship V3" → "What happens when it fails? How do users recover? What failure modes are you willing to discover in production?"
"Users will trust it" → "How do you know? What's your evidence? Have you tested this with real users at lower agency?"
"We'll figure out V2 later" → "What must V1 prove to inform V2? If you don't know what you're testing, how will you know when to promote?"
"It's just a small feature" → "What's the consequence of AI being wrong? Even 'small' features can have significant trust implications."
Before /agency-ladder:
/spec --ai - Define what the AI is doingAfter /agency-ladder:
/calibrate - Post-launch calibration workflow/ai-health-check - Pre-launch validationFramework: CC/CD (Continuous Calibration/Continuous Development) Source: Aishwarya Naresh Reganti & Kiriti Badam (Lenny's Newsletter) Adaptation: Agency ladder progression planning