Help us improve
Share bugs, ideas, or general feedback.
From erpaval
Audits and rewrites prompts for Claude Opus 4.7 instruction-following best practices. Produces a scored audit and rewritten prompt for system prompts, CLAUDE.md, agent definitions, slash commands, SKILL.md, Jinja2 templates, and Pydantic/Zod schemas.
npx claudepluginhub theagenticguy/erpaval --plugin erpavalHow this skill is triggered — by the user, by Claude, or both
Slash command
/erpaval:meta-prompt-optimizerThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
| File | When to load |
evals/evals.jsonevals/fixtures/bad-agent.mdevals/fixtures/bad-claude-md.mdevals/fixtures/bad-jinja-template.j2evals/fixtures/bad-zod-tools.tsevals/fixtures/okayish-system-prompt.txtevals/trigger-eval-reviewed.jsonevals/trigger-eval.jsonreferences/opus-4-7-principles.mdreferences/rewriting-patterns.mdreferences/rubric.mdreferences/target-types.mdOptimizes system prompts for Claude Code agents using research-backed prompt engineering patterns. Use when users request prompt improvement, optimization, or refinement for agent workflows, tool instructions, or system behaviors.
Optimizes system prompts for Claude Code agents using research-backed prompt engineering patterns. Use for requests to improve, refine, or review agent workflows, tool instructions, or behaviors.
Reviews and analyzes LLM prompts against prompt-engineering principles, providing detailed assessments without modifying files. Useful for optimizing prompt quality in AI applications.
Share bugs, ideas, or general feedback.
| File | When to load |
|---|---|
references/opus-4-7-principles.md | Core prompting principles and the 4.7-specific behavior shifts |
references/rubric.md | Scoring rubric — shared with the prompt-critic agent |
references/rewriting-patterns.md | Before/after transformation patterns with worked examples |
references/target-types.md | Type-specific guidance (CLAUDE.md, agents, commands, Jinja2, Pydantic/Zod) |
The prompt-critic agent in this plugin shares the same rubric and is the right tool when the user wants a scored assessment without a rewrite.
Audit and rewrite prompts so they work well on Claude Opus 4.7. Every run produces two deliverables: a scored audit (what's wrong and why) and a rewritten prompt that applies the fixes.
The guidance is grounded in Anthropic's official prompting best-practices doc for the Claude 4.x family, with emphasis on the Opus 4.7-specific behavior shifts (literal instruction following, adaptive thinking, effort-driven calibration, dialed-back emphasis language, positive framing).
The user's request determines which mode to run.
| Mode | Trigger | Output |
|---|---|---|
| Audit only | "review this prompt", "critique this", "is this any good?", "score this" | Structured audit report (pass/fail per rubric item + notes) |
| Audit + rewrite | "improve this", "rewrite for 4.7", "optimize this", "fix this prompt" | Audit report + revised prompt + changelog |
| Draft from intent | "write a system prompt that…", "draft an agent for…", "give me a CLAUDE.md for…" | New prompt written against the rubric from the start |
If the mode is ambiguous, ask once. Default to audit + rewrite when the user hands over an existing prompt.
Different prompt types need different treatment. Pick the matching section of references/target-types.md and load it before proceeding:
@path are an option..md with YAML frontmatter) — description is the triggering mechanism, body is the system prompt..md with YAML frontmatter) — arguments, bash execution, may compose with other skills.If you can't tell what type it is from context, ask.
Always load:
references/opus-4-7-principles.md — the principles you're grading against.references/rubric.md — the scoring rubric.Load selectively:
references/rewriting-patterns.md — when rewriting (modes 2 and 3).references/target-types.md — the section matching the detected type.For each rubric item, mark it pass / needs work / rethink and note a concrete finding. Cite line numbers or specific phrases in the source prompt when calling out issues. Audits without evidence are not useful.
The rubric has four categories:
A full rubric walkthrough lives in references/rubric.md. Use it verbatim — do not improvise new rubric items mid-audit.
Apply the transformations from references/rewriting-patterns.md. The most common ones:
**IMPORTANT:** / **YOU MUST:** for one or two high-cost rules. Opus 4.x over-triggers on aggressive language that was tuned for older models.<instructions>, <context>, <examples>, <frontend_aesthetics>.Output format depends on mode:
Audit only:
## Audit: [prompt name or summary]
### Scores
| Category | Score | Notes |
| --------------------- | --------------------------- | ------------------ |
| Directness & clarity | Pass / Needs work / Rethink | [specific finding] |
| Framing & tone | … | … |
| Structure | … | … |
| Type-specific fitness | … | … |
### Findings
1. **[Finding title]** — [what's wrong, with line reference]. Impact: [why it matters].
2. …
### Top priorities to fix
[numbered list of the 2-3 highest-leverage changes]
Audit + rewrite:
Same audit section, then:
### Rewritten prompt
[the full revised prompt]
### Changelog
- [Change 1] — [which rubric item it addresses]
- [Change 2] — …
Draft from intent:
## Drafted prompt: [name]
[the prompt]
### Rationale
- [Decision 1] — [why]
- [Decision 2] — …
After producing the deliverable, offer to run the prompt-critic agent on the result for an independent second opinion. The critic uses the same rubric from an unbiased starting point — useful for checking that the rewrite actually improved the scores rather than shuffling them.
Invoke it with the Agent tool, subagent_type: "prompt-critic", passing the rewritten prompt and the prompt's target type. Don't auto-run it — ask first, because users iterating quickly may not want the extra round trip. (Note: the prompt-critic agent is not bundled in this plugin — it lives in the upstream personal-plugins repo. If unavailable, the rewrite is the deliverable.)
Audit the template skeleton and the rendered output separately. Silent invalidators (per-request timestamps, user IDs, unsorted dict serialization) inside the template will break prompt caching. Flag them. See references/target-types.md → "Jinja2 templates".
These are terse by nature. The rubric applies differently — short is good here, but the description must answer three questions: what does this tool do, when should the model call it, what do the parameters mean. See references/target-types.md → "Tool descriptions".
If the user is migrating a prompt from an older model (e.g., Sonnet 3.7 → Opus 4.7), the rewrite should remove obsolete parameter guidance (sampling parameters, budget_tokens), dial back the shouting that older models needed, and flip negative framings. Note the migration explicitly in the changelog.
This skill is a prompt-optimizer — its output is held to the same standard it grades against.
references/rubric.md. The rubric is the contract.