npx claudepluginhub arbazkhan971/godmodeThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
/godmode:prompt, "design a prompt", "test prompt"Task: <what the prompt must accomplish>
Model: <target model>
Input/Output: <format and constraints>
Quality: accuracy target, safety, consistency
Budget: max tokens, latency, cost per call
| Pattern | Best for |
|---|---|
| Zero-shot | Simple tasks model handles well |
| Few-shot | Tasks needing format/style examples |
| Chain-of-thought | Reasoning, math, multi-step |
| ReAct | Tool-using agents, search+reason |
| Tree-of-thought | Exploring alternatives |
| Self-consistency | High-stakes, multiple paths |
| Structured output | JSON, XML, typed schema |
IF simple classification: zero-shot. IF format matters: few-shot with 2-5 examples.
Structure: 1) Role, 2) Task, 3) Input Format, 4) Output Format, 5) Constraints, 6) Examples, 7) Edge Cases.
Most important instructions at beginning and end (primacy/recency effect). Examples > instructions.
Cover: common case, edge case, output format. 2-5 examples typical. Track token overhead.
Options: JSON mode, function calling, prompt-based, constrained decoding. Validate against schema. Retry on failure (max 2-3 attempts).
<user_input> tagsIF user input enters prompt: injection defense required. IF output contains system prompt text: leak detected.
Categories: golden set, edge cases, format compliance, safety, injection resistance, consistency. Metrics: accuracy (target), format (100%), safety (>99%), injection resistance (>95%), latency, cost.
Track: version, accuracy, latency, cost, status. A/B test with traffic split, significance testing (alpha=0.05). Minimum 100 samples per variant.
prompts/<task>/: prompt-spec.yaml, system-prompt.md,
examples.yaml, tests.yaml, eval-results.md.
# Test prompt templates
curl -X POST http://localhost:8080/api/chat -d '{"prompt":"test"}'
pytest tests/test_prompts.py -v
WHILE iteration < 5 AND accuracy < target:
1. DIAGNOSE failures (format, wrong, hallucination)
2. GENERATE ONE change targeting top failure
3. EVALUATE on same golden set
4. COMPARE: accept if improved + no regression
Append .godmode/prompt-results.tsv:
timestamp version model accuracy_pct latency_ms injection_safe status
KEEP if: accuracy improved/maintained AND injection
tests pass AND output parseable.
DISCARD if: accuracy dropped OR injection bypass
OR format breaks.
STOP when ALL of:
- Accuracy meets target on test suite
- Injection hardening passes all cases
- Output format consistent and parseable
- Latency within budget
On failure: git reset --hard HEAD~1. Never pause.
| Failure | Action |
|---|---|
| Inconsistent output | JSON mode, temperature=0, examples |
| Injection bypasses | Sanitize input, isolate system prompt |
| Model refuses valid | Rephrase, explicit context setting |
| Scores drop after edit | Compare diffs, A/B test old vs new |