From bette-think
Optimizes prompts for production AI features with analysis, 6-step framework, failure detection, and research-backed techniques. Use for prompt review, system prompts, or improvement suggestions.
How this skill is triggered — by the user, by Claude, or both
Slash command
/bette-think:prompt-engineeringThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Master system for creating, analyzing, and optimizing prompts for AI products using research-backed techniques and battle-tested production patterns.
Master system for creating, analyzing, and optimizing prompts for AI products using research-backed techniques and battle-tested production patterns.
When improving any prompt, follow this systematic process:
Begin with what the model CANNOT do, not what it should do.
Pattern:
NEVER:
- [TOP 3 FAILURE MODES - BE SPECIFIC]
- Use meta-phrases ("I can help you", "let me assist")
- Provide information you're not certain about
ALWAYS:
- [TOP 3 SUCCESS BEHAVIORS - BE SPECIFIC]
- Acknowledge uncertainty when present
- Follow the output format exactly
Why: LLMs are more consistent at avoiding specific patterns than following general instructions. "Never say X" is more reliable than "Always be helpful."
Use formatting that signals technical documentation quality:
<system_constraints>, <task_instructions>)Why: Well-structured documents trigger higher-quality training data patterns.
Don't optimize manually - let the model do it using this meta-prompt:
You are a prompt optimization specialist. Your job is to improve prompts for production AI systems.
CURRENT PROMPT:
[User's prompt here]
PERFORMANCE DATA:
- Main failure modes: [List top 3 if known]
- Target use case: [Describe]
OPTIMIZATION TASK:
1. Identify the top 3 weaknesses in this prompt
2. Rewrite to fix those weaknesses using these principles:
- Hard constraints over soft instructions
- Specific examples over generic guidance
- Structured format over free text
3. Predict the improvement percentage for each change
CONSTRAINTS:
- Must maintain core functionality
- Cannot exceed 150% of current token count
- Must include failure mode handling
OUTPUT:
Optimized prompt + rationale for each change
Test the prompt systematically:
Identify the top 3 failure patterns and address them explicitly in the prompt.
Define clear success metrics:
Phase 1: Climb Up for Quality
Phase 2: Descend for Cost
Use this battle-tested template structure:
<system_role>
You are [SPECIFIC ROLE], not a general AI assistant.
You [CORE FUNCTION] for [TARGET USER].
</system_role>
<hard_constraints>
NEVER:
- [FAILURE MODE 1 - SPECIFIC]
- [FAILURE MODE 2 - SPECIFIC]
- [FAILURE MODE 3 - SPECIFIC]
- Use meta-phrases ("I can help you", "let me assist")
ALWAYS:
- [SUCCESS BEHAVIOR 1 - SPECIFIC]
- [SUCCESS BEHAVIOR 2 - SPECIFIC]
- [SUCCESS BEHAVIOR 3 - SPECIFIC]
- Acknowledge uncertainty when present
</hard_constraints>
<context_info>
Current user: [USER_CONTEXT]
Available tools: [TOOL_LIST]
Key limitations: [SPECIFIC_LIMITATIONS]
</context_info>
<task_instructions>
Your job is to [CORE TASK] by:
1. [STEP 1 - SPECIFIC ACTION]
2. [STEP 2 - SPECIFIC ACTION]
3. [STEP 3 - SPECIFIC ACTION]
If [EDGE_CASE_1], then [SPECIFIC_RESPONSE].
If [EDGE_CASE_2], then [SPECIFIC_RESPONSE].
If [EDGE_CASE_3], then [SPECIFIC_RESPONSE].
</task_instructions>
<output_format>
Respond using this exact structure:
[SECTION_1]: [DESCRIPTION]
[SECTION_2]: [DESCRIPTION]
Requirements:
- [FORMAT_REQUIREMENT_1]
- [FORMAT_REQUIREMENT_2]
</output_format>
<examples>
Example 1 - Happy Path:
Input: [TYPICAL_INPUT]
Output: [IDEAL_RESPONSE]
Example 2 - Edge Case:
Input: [EDGE_CASE_INPUT]
Output: [EDGE_CASE_RESPONSE]
Example 3 - Complex:
Input: [COMPLEX_SCENARIO]
Output: [COMPLEX_RESPONSE]
</examples>
Best for: Financial dashboards, data analysis, table processing Performance: 8.69% improvement on table tasks How: Make the AI manipulate table structure step-by-step, not reason about tables in text
Best for: Arithmetic reasoning, logic puzzles, formal reasoning Limitations: Only works on 100B+ parameter models; minimal benefit for content generation When NOT to use: Classification, content generation, most business tasks
When it helps: Task requires specific style, format examples improve output When it hurts: Advanced reasoning tasks (o1, DeepSeek R1 models) Best practice: Test systematically - few-shot has highest variability of any technique
Best for: Customer support, sales conversations, multi-turn interactions How: Show entire conversation flows, not isolated examples Benefit: Teaches conversation patterns, not just individual responses
Problem: One massive prompt trying to do sentiment analysis, routing, response generation, and task management simultaneously.
Fix: Break into specialized prompts:
Each prompt does ONE thing exceptionally well.
Problem: Prompt works perfectly on clean, polite, well-formatted demo data but fails on 40% of real production inputs.
Fix: Build eval suite from real chaos:
Problem: Shipping a prompt and never updating it as business evolves, user needs change, and new edge cases emerge.
Fix: Build continuous optimization:
Shorter, structured prompts have major advantages:
Example comparison:
Benefits of compression:
When to use longer prompts: Complex tasks requiring extensive context, edge case handling, or when that 88% cost increase delivers proportional value.
When user provides a prompt to improve:
Identify Current State
Analyze Against Framework
Provide Specific Recommendations
Offer Complete Rewrite
Suggest Testing Strategy
Conciseness Matters - Context window is shared. Only include what Claude doesn't already know.
Structure = Quality - XML for Claude, JSON for GPT-3.5, Markdown for docs. Format signals quality.
Hard Constraints Over Soft - "Never do X" is more reliable than "Be helpful."
Systematic Testing - Build evals with 20% happy path, 60% edge cases, 20% adversarial.
Continuous Optimization - Prompts decay as business evolves. Build iteration into workflow.
Cost-Performance Balance - Climb for quality first, then descend for cost optimization.
Use Chain-of-Table when:
Use Chain-of-Thought when:
Use Few-Shot when:
Use Multi-Shot when:
Use Nested Prompting when:
When providing prompt improvements, always:
npx claudepluginhub breethomas/bette-think --plugin bette-thinkProvides workflows to write, debug, and optimize LLM prompts using few-shot examples, chain-of-thought structuring, system prompts, and templates. Activates for prompt improvement requests.
Teaches prompt engineering patterns including few-shot learning, chain-of-thought prompting, prompt optimization, and template systems. Useful for improving LLM output reliability, debugging agent behavior, or learning prompting strategies.
Provides prompt engineering patterns including few-shot learning, chain-of-thought prompting, optimization techniques, and templates. Improves LLM performance, reliability, and agent debugging.