From prompt-writer
Write, review, and improve prompts for any LLM — Claude, GPT, Gemini, Llama, DeepSeek, Mistral, Cohere, Qwen, Grok, Nova, and more. Use when the user asks to "write a system prompt", "improve this prompt", "review my prompt", "make a prompt for", "optimize my prompt", "fix my prompt", "why isn't my prompt working", or wants help writing better prompts for any AI model. Also use when building agents, chatbots, or AI assistants that need system-level instructions, or when the user has a bad prompt they want rewritten. Covers system prompts, task prompts, tool descriptions, and general prompt improvement across all major model families.
npx claudepluginhub taradepan/prompt-writerThis skill uses the workspace's default tool permissions.
Write and improve production-grade prompts for any LLM, grounded in official documentation, peer-reviewed research, and patterns from production systems like Claude Code, Cursor, Manus, and Devin.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
Share bugs, ideas, or general feedback.
Write and improve production-grade prompts for any LLM, grounded in official documentation, peer-reviewed research, and patterns from production systems like Claude Code, Cursor, Manus, and Devin.
Before starting, identify the task:
| User intent | Mode | What to do |
|---|---|---|
| "Write a prompt for..." | Write | Follow Steps 1-5 below |
| "Improve/fix/optimize this prompt" | Improve | See Improve Mode below, then references/prompt-improvement.md for full workflow |
| "Review this prompt" | Review | Run Step 4 (Review and Harden) on their prompt |
| "Write a tool description" | Tool | Jump to the Tool Descriptions section in Step 3 |
Current as of March 2026. Model guidance reflects Claude 4.6, GPT-5.4, Gemini 3, Llama 4, DeepSeek R1/V3. Verify against official docs for newer releases.
When the user has an existing prompt that isn't working:
For the full workflow with detailed patterns and examples, see references/prompt-improvement.md.
This skill covers writing and improving prompts. It does not cover:
Before writing, gather:
If the user hasn't specified these, ask before writing. A prompt without context is a prompt that will fail.
Universal structure adapted from Anthropic, OpenAI, and Google's official guides:
1. Role & Identity — Who is this agent?
2. Core Instructions — What are the behavioral rules?
3. Tool/Capability Guide — How to use available tools
4. Reasoning Guidance — When and how to think step-by-step
5. Output Format — What shape should responses take?
6. Examples — 2-5 demonstrations of desired behavior
7. Context/Reference — Domain knowledge, loaded last
8. Final Reminders — Closing behavioral anchors
Not every prompt needs all sections. A simple chatbot needs 1, 2, 5, 6. A complex agent needs all eight. A task prompt may need only 2, 5, 6.
One to three sentences. Anchors all subsequent behavior.
You are a senior backend engineer specializing in Python and Django.
You help developers debug production issues by analyzing logs,
tracing errors, and suggesting minimal fixes.
Audience framing amplifies this — stating who the output is for improves relevance by up to 100% in benchmarks.
Structure with markdown headers or XML tags. Key principles:
Explain the why, not just the what. Models generalize better from motivation than bare commands. Instead of "Never use ellipses", write "Avoid ellipses because the TTS engine cannot pronounce them."
Use positive framing. "Write in complete sentences" outperforms "Don't use fragments." Tell the agent what TO do. Research confirms positive instructions actively boost desired token probabilities, while negatives only weakly suppress.
Be specific enough to verify. "Use 2-space indentation" is testable. "Format code properly" is not.
Resolve contradictions. Models waste reasoning tokens reconciling conflicts. Add explicit priority: "If rule A and rule B conflict, rule A takes precedence."
Match language intensity to model — this is critical:
| Model Family | Guidance Style |
|---|---|
| Claude 4.x | Conversational. Explain reasoning. Avoid CAPS/MUST/NEVER — causes overtriggering. Use "prefer", "because". |
| GPT-5.x | Direct but not aggressive. Naturally thorough — don't over-prompt for thoroughness. |
| GPT-4.1 | More literal than predecessors. A single clarifying sentence corrects behavior. |
| Gemini 3 | Short, direct specs only. Logic over persuasion. No flowery language. |
| DeepSeek R1 | NO system prompt. Put everything in user message. No few-shot. No "think step by step." |
| Mistral | Hierarchical markdown sections. Explicit step-by-step helps. |
| Cohere | Minimal preamble. Pass docs via API, not prompt. RAG-first. |
| Qwen/QwQ | Standard. For QwQ: non-greedy sampling required (temp=0.6, top_p=0.95). |
| Grok | Verbose, detailed system prompts work best. XML/MD both fine. |
| Nova | CAPS DO/MUST/DO NOT encouraged (opposite of Claude!). Numbered steps. |
| Smaller/Mini | Most literal. Critical rules FIRST. Numbered steps. Full execution order. |
For full model-specific guidance, see references/model-specific-tuning.md.
Techniques effective for mid-tier models can actively harm frontier models. As capability increases, simpler prompts work better. Constraints that prevent common-sense errors in weaker models induce hyper-literalism in stronger ones.
Rule: always calibrate prompt complexity to model capability. When in doubt, start simple.
Tool descriptions are prompts — 97.1% of tool descriptions in production have quality defects. Each tool needs a 3-4 sentence contract:
search_codebase: Searches the repository for code matching a query.
Returns matching lines with file paths and line numbers. Use when
you need to find function definitions, usage patterns, or specific
snippets. Do not use for reading entire files — use read_file instead.
Query supports regex. Results are limited to 50 matches.
The six essentials (from MCP tool description research, 2026):
Match reasoning instructions to the model:
The Think Tool pattern (+54% on complex policy tasks): Give agents a no-op tool called think — a callable scratchpad between tool calls. Different from extended thinking. Include domain-specific examples showing HOW to use it.
Chain-of-Draft (cost-efficient alternative to CoT): When reasoning is needed but cost/latency matters, limit each reasoning step to ~5 words. Matches CoT accuracy at 7.6% of the tokens: "Think step by step, but keep only a minimum draft for each step, 5 words at most."
For agents, three reminders consistently improve performance (+20%):
Be explicit. Without format guidance, models add prose, markdown fences, or unnecessary structure.
The CTCO pattern (from OpenAI's GPT-5.4 guide — most reliable anti-hallucination structure):
Respond with a JSON object containing:
- "summary": one-sentence description
- "severity": "low" | "medium" | "high" | "critical"
- "fix": the recommended code change as a unified diff
Examples are the highest-impact technique — up to 90% accuracy improvement.
The 2-5 rule: Major gains after 2; diminishing returns after 5. More can paradoxically degrade performance (the "few-shot dilemma" — Gemma 7B dropped from 77.9% to 39.9%).
Quality over quantity: TF-IDF-based example selection outperforms random sampling by 1%+ while avoiding degradation. Choose examples relevant to the actual use case.
Contrastive examples dramatically improve reasoning: showing both a wrong approach (with why it's wrong) and the correct approach. GSM8K: 35.9% → 88.8% with GPT-4 using this pattern.
Order matters: Alternate positive and negative examples to avoid bias.
Format by model:
<examples><example><user>...</user><assistant>...</assistant></example></examples>Place long-form reference material AFTER instructions and examples. Queries placed after long context improve quality by up to 30%.
Lost-in-the-middle: Models attend more to the beginning and end of prompts (architectural, not training artifact). Place highest-priority content at the start and end. Never bury critical instructions in the middle.
For Claude, use XML tags: <context>, <documents>, <reference_material>.
Close with 1-3 critical behavioral anchors. Models weight content at the beginning and end more heavily.
Structural quality:
Language quality:
Robustness:
Efficiency:
See references/security-patterns.md for defense patterns.
For systematic prompt improvement, see references/prompt-improvement.md.
Quick reference for choosing prompting strategy by model:
| Model | System Prompt | Delimiter Style | Thinking | Key Differentiator |
|---|---|---|---|---|
| Claude 4.x | Separate role | XML tags | Extended thinking | Soft language, no CAPS |
| GPT-5.x | System + instructions | Markdown | o-series reasoning | CTCO pattern, reasoning_effort |
| GPT-4.1 | System | Markdown | Manual CoT | Ultra-literal, sandwich method |
| Gemini 3 | System instruction | Hierarchy/outline | thinking_level | Temp must stay 1.0, concise |
| Llama 4 | Special tokens | Special tokens | None | Exact template required |
| DeepSeek R1 | NO system prompt | Plain text in user msg | Auto | Zero-shot only, no examples |
| DeepSeek V3 | Yes | Standard | None | Use for speed, R1 for depth |
| Mistral | Prepended to user | Markdown sections | None | Prefix injection, JSON dual-instruct |
| Cohere | Preamble (optional) | Documents API | None | RAG-first, auto-citations |
| Qwen/QwQ | apply_chat_template | ChatML | enable_thinking | Non-greedy required for QwQ |
| Grok 3 | Separate (verbose) | XML or Markdown | reasoning_effort | Real-time X data, cache-stable |
| Nova | Separate | Markdown | None | CAPS encouraged, temp=0 for tools |
For detailed per-model guidance: references/model-specific-tuning.md
# Role
[1-3 sentences: who, what domain, what tone]
# Instructions
[Behavioral rules, organized by category]
[Each rule with motivation: "because..."]
# Tools
[For each tool: what it does, when to use, when not to use, limitations]
# Output Format
[Explicit format specification]
# Examples
[2-5 diverse, relevant demonstrations]
# Context
[Reference material, loaded last]
# Reminders
[1-3 critical behavioral anchors]
| Scenario | Key Sections | Notes |
|---|---|---|
| Simple chatbot | Role, Instructions, Format, Examples | Skip tools and reasoning |
| Coding agent | All 8 sections | Add persistence + planning reminders, think tool |
| Data analyst | Role, Instructions, Tools, Format | Emphasize structured output |
| Content writer | Role, Instructions, Examples, Format | Heavy on examples and tone |
| Customer support | Role, Instructions, Examples, Context | Load FAQ as context |
| Multi-agent orchestrator | Role, Instructions, Tools, Reasoning | Define delegation rules, effort budgets |
| Task prompt (one-shot) | Instructions, Format, Examples | No role needed, be specific |
| Prompt improvement | See references/prompt-improvement.md | Diagnose → fix → evaluate |
For models that support vision (Claude, GPT-4o+, Gemini, Llama 4, Grok):
Image placement matters:
Key patterns:
Token costs vary significantly:
Each model handles structured output differently:
| Model | How to enable | Key requirement |
|---|---|---|
| Claude | Request JSON in instructions + use XML output tags | Specify exact schema in prompt |
| GPT | response_format: {type: "json_object"} or Structured Outputs with JSON Schema | Must mention "JSON" in prompt |
| Gemini | responseMimeType: "application/json" + responseSchema | Define required arrays explicitly (avoids nulls) |
| DeepSeek | response_format: {type: "json_object"} | Include "json" in prompt + set adequate max_tokens |
| Mistral | response_format: {type: "json_object"} | Must ALSO instruct JSON in prompt text (dual requirement) |
| Llama | Instruct in prompt; validate output | No native JSON mode — always validate server-side |
| Cohere | Instruct in prompt | Use document grounding for factual JSON |
| Qwen | Instruct in prompt | Use apply_chat_template |
| Grok | Pydantic/JSON Schema via Structured Outputs | Use schema definitions |
| Nova | Instruct in prompt | Set max_tokens high enough to avoid truncation |
Universal best practice: Always provide the exact JSON schema with field types, not just "return JSON". Show one complete example of the expected output.
Before: "Summarize this article" After: "Summarize this article in 3 bullet points for a product manager. Each bullet: one sentence, starts with an action verb. Focus on decisions needed, not background."
Before: "You MUST ALWAYS follow these CRITICAL rules: NEVER use markdown. ALWAYS respond in JSON." After: "Respond in JSON format because the output feeds directly into our parser. Avoid markdown formatting since the parser doesn't handle it. Here's the expected schema: {example}"
Before: "Read this codebase, find all security vulnerabilities, fix them, write tests, update the docs, and create a PR." After (prompt 1 of 3): "Scan this codebase for security vulnerabilities. For each finding, output: file path, line number, vulnerability type (OWASP category), severity (high/medium/low), and a one-line fix description. Output as JSON array."
Before (system prompt): "You are a code reviewer." After (user message — no system prompt): "Task: Review this Python function for bugs and performance issues. Requirements: Focus on correctness first, then performance. Flag any edge cases that would cause exceptions. Output format: For each issue: {line, issue_type, severity, fix}"
Before: "Classify customer feedback as positive, negative, or neutral." After: "Classify customer feedback as positive, negative, or neutral.
INCORRECT approach: 'The product works but the delivery was late' → positive (wrong: mixed sentiment defaults to negative when a complaint is present) CORRECT approach: 'The product works but the delivery was late' → negative (delivery complaint outweighs neutral product statement)
Classify: {input}"
Prompt engineering is now context engineering — curating the minimal high-signal token set the model needs. Every instruction should earn its place in the context window.
Four strategies (Anthropic):
Context rot degrades quality in long conversations:
Prompt caching changes how prompts should be structured:
| Anti-Pattern | Why It Fails | Fix |
|---|---|---|
| ALL-CAPS MUST/NEVER | Claude 4.x overtriggers; GPT-5 wastes tokens (except Nova) | Explain the why; use CAPS only for Nova |
| "Be thorough" | Modern models are already thorough; causes overengineering | Specify exact scope and boundaries |
| Contradictory rules | Models attempt both, degrading quality | Add explicit priority ordering |
| 20+ examples | Burns context; paradoxically degrades performance | Use 2-5 diverse, TF-IDF-selected examples |
| Vague instructions | Not testable, not actionable | Make each rule verifiable |
| Negative-only framing | "Don't X" is weaker than "Do Y" | Reframe as positive instructions |
| Same prompt across models | Each family responds differently (Inversion effect) | Tune per model; start simple for frontier |
| No examples at all | Removes highest-impact technique | Add 2+ demonstrations (except DeepSeek R1) |
| Monolithic wall of text | Hard to parse, sections blur | Use headers, XML tags, or delimiters |
| CoT on reasoning models | 35-600% more latency, marginal gains | Use self-verification instead |
| One-sentence tool descriptions | 97% of tool descriptions have defects | Use 3-4 sentence contracts with limitations |
| Critical info in the middle | Lost-in-the-middle: models attend to start/end | Put critical content at beginning and end |
For agents with tool access or external data:
See references/security-patterns.md for the full defense hierarchy and architectural patterns.
references/templates.md — Ready-to-use prompt templates: coding agent, chatbot, data extraction, content writer, orchestrator, per-model skeletons (Claude/GPT/Gemini/DeepSeek/Llama/Cohere/Nova), meta-prompts for rewriting/diagnosing/optimizing, tool description templatereferences/model-specific-tuning.md — Per-model guidance for all 10 families: language patterns, structural preferences, tool calling, anti-patterns, migration tipsreferences/prompt-improvement.md — Systematic workflow for improving existing prompts: 6-dimension defect taxonomy, diagnostic checklist, rewriting patterns, meta-prompting, evaluationreferences/security-patterns.md — Defense hierarchy, spotlighting, OWASP Top 10 for LLMs, architectural patterns, trust boundariesreferences/research-evidence.md — Key papers with measured improvements: technique rankings, prompting inversion, CoT findings, few-shot dilemma, context engineering, automated optimizationreferences/production-prompt-anatomy.md — Structural analysis of Claude Code, Cursor, Manus, Devin, v0: universal patterns, conditional assembly, three eras, caching architecture