Help us improve
Share bugs, ideas, or general feedback.
From godmode
Designs, tests, versions, and optimizes prompts for LLMs using patterns like zero-shot, few-shot, CoT, ReAct; covers injection prevention, evaluation, and A/B testing.
npx claudepluginhub arbazkhan971/godmodeHow this skill is triggered — by the user, by Claude, or both
Slash command
/godmode:promptThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- `/godmode:prompt`, "design a prompt", "test prompt"
Designs, optimizes, and evaluates LLM prompts — generating templates, structured output schemas, evaluation rubrics, and test suites. Use for prompt refactoring, chain-of-thought, or system prompt design.
Analyzes LLM prompt failure modes, generates variants (zero-shot, few-shot, CoT), designs evaluation rubrics, and produces test suites for optimization.
Optimizes prompts for LLMs using constitutional AI, chain-of-thought reasoning, and model-specific techniques. Transforms basic instructions into production-ready prompts to improve accuracy, reduce hallucinations, and cut costs.
Share bugs, ideas, or general feedback.
/godmode:prompt, "design a prompt", "test prompt"Task: <what the prompt must accomplish>
Model: <target model>
Input/Output: <format and constraints>
Quality: accuracy target, safety, consistency
Budget: max tokens, latency, cost per call
| Pattern | Best for |
|---|---|
| Zero-shot | Simple tasks model handles well |
| Few-shot | Tasks needing format/style examples |
| Chain-of-thought | Reasoning, math, multi-step |
| ReAct | Tool-using agents, search+reason |
| Tree-of-thought | Exploring alternatives |
| Self-consistency | High-stakes, multiple paths |
| Structured output | JSON, XML, typed schema |
IF simple classification: zero-shot. IF format matters: few-shot with 2-5 examples.
Structure: 1) Role, 2) Task, 3) Input Format, 4) Output Format, 5) Constraints, 6) Examples, 7) Edge Cases.
Most important instructions at beginning and end (primacy/recency effect). Examples > instructions.
Cover: common case, edge case, output format. 2-5 examples typical. Track token overhead.
Options: JSON mode, function calling, prompt-based, constrained decoding. Validate against schema. Retry on failure (max 2-3 attempts).
<user_input> tagsIF user input enters prompt: injection defense required. IF output contains system prompt text: leak detected.
Categories: golden set, edge cases, format compliance, safety, injection resistance, consistency. Metrics: accuracy (target), format (100%), safety (>99%), injection resistance (>95%), latency, cost.
Track: version, accuracy, latency, cost, status. A/B test with traffic split, significance testing (alpha=0.05). Minimum 100 samples per variant.
prompts/<task>/: prompt-spec.yaml, system-prompt.md,
examples.yaml, tests.yaml, eval-results.md.
# Test prompt templates
curl -X POST http://localhost:8080/api/chat -d '{"prompt":"test"}'
pytest tests/test_prompts.py -v
WHILE iteration < 5 AND accuracy < target:
1. DIAGNOSE failures (format, wrong, hallucination)
2. GENERATE ONE change targeting top failure
3. EVALUATE on same golden set
4. COMPARE: accept if improved + no regression
Append .godmode/prompt-results.tsv:
timestamp version model accuracy_pct latency_ms injection_safe status
KEEP if: accuracy improved/maintained AND injection
tests pass AND output parseable.
DISCARD if: accuracy dropped OR injection bypass
OR format breaks.
STOP when ALL of:
- Accuracy meets target on test suite
- Injection hardening passes all cases
- Output format consistent and parseable
- Latency within budget
On failure: git reset --hard HEAD~1. Never pause.
| Failure | Action |
|---|---|
| Inconsistent output | JSON mode, temperature=0, examples |
| Injection bypasses | Sanitize input, isolate system prompt |
| Model refuses valid | Rephrase, explicit context setting |
| Scores drop after edit | Compare diffs, A/B test old vs new |