Skill

prompt-engineering

Provides universal prompt engineering techniques like XML structuring, output constraints, scope controls, and ambiguity handling to craft, optimize, or review LLM prompts.

ai-ml

npx claudepluginhub codealive-ai/ai-driven-development --plugin ai-driven-development

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Universal techniques for crafting effective prompts across any LLM.

Supporting Assets

README.mdreferences/claude-family-prompting.mdreferences/evaluation-redteaming.mdreferences/failure-taxonomy.mdreferences/gemini3-family-prompting.mdreferences/gpt5-family-prompting.mdreferences/gpt5-prompting-guide.mdreferences/mistakes-context.mdreferences/mistakes-debt.mdreferences/mistakes-hallucinations.mdreferences/mistakes-security.mdreferences/mistakes-structure.mdreferences/prompt-audit-checklist.mdreferences/prompting-introduction.mdreferences/prompting-risks.mdreferences/prompting-techniques.md

SKILL.md

Similar Skills

prompt-engineering

Guides crafting effective LLM prompts using techniques like chain-of-thought, XML tags, role prompting, multishot examples, and self-verification. Use for improving output quality or debugging responses.

itsmostafa-llm-engineering-skills

prompt-engineer

682

Crafts advanced LLM prompts with chain-of-thought, constitutional AI, meta-prompting, and optimization techniques. Use for AI features, agent performance, system prompts.

rmyndharis-antigravity-skills

prompt-engineering

Optimizes prompts for production AI features with analysis, 6-step framework, failure detection, and research-backed techniques. Use for prompt review, system prompts, or improvement suggestions.

2 files

bette-think

Stats

Stars46

Forks2

Last CommitMay 4, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Prompt Engineering

Universal techniques for crafting effective prompts across any LLM.

Core Principles

1. Structure with XML Tags

Use XML tags to create clear, parseable prompts:

<context>Background information here</context>
<instructions>
1. First step
2. Second step
</instructions>
<examples>Sample inputs/outputs</examples>
<output_format>Expected structure</output_format>

Benefits:

Clarity: Separates context, instructions, and examples
Accuracy: Prevents model from mixing up sections
Flexibility: Easy to modify individual parts
Parseability: Enables structured output extraction

Best practices:

Use consistent tag names throughout (<instructions>, not sometimes <steps>)
Reference tags explicitly: "Using the data in <context> tags..."
Nest tags for hierarchy: <examples><example id="1">...</example></examples>
Combine with other techniques: <thinking> for chain-of-thought, <answer> for final output

2. Control Output Shape

Specify explicit constraints on length, format, and structure:

<output_spec>
- Default: 3-6 sentences or ≤5 bullets
- Simple yes/no questions: ≤2 sentences
- Complex multi-step tasks:
  - 1 short overview paragraph
  - ≤5 bullets: What changed, Where, Risks, Next steps, Open questions
- Use Markdown with headers, bullets, tables when helpful
- Avoid long narrative paragraphs; prefer compact structure
</output_spec>

3. Prevent Scope Drift

Explicitly constrain what the model should NOT do:

<constraints>
- Implement EXACTLY and ONLY what is requested
- No extra features, components, or embellishments
- If ambiguous, choose the simplest valid interpretation
- Do NOT invent values, make assumptions, or add unrequested elements
</constraints>

4. Handle Ambiguity Explicitly

Prevent hallucinations and overconfidence:

<uncertainty_handling>
- If the question is ambiguous:
  - Ask 1-3 precise clarifying questions, OR
  - Present 2-3 plausible interpretations with labeled assumptions
- When facts may have changed: answer in general terms, state uncertainty
- Never fabricate exact figures or references when uncertain
- Prefer "Based on the provided context..." over absolute claims
</uncertainty_handling>

5. Long-Context Grounding

For inputs >10k tokens, add re-grounding instructions:

<long_context_handling>
- First, produce a short internal outline of key sections relevant to the request
- Re-state user constraints explicitly before answering
- Anchor claims to sections ("In the 'Data Retention' section...")
- Quote or paraphrase fine details (dates, thresholds, clauses)
</long_context_handling>

Agentic Prompts

Tool Usage Rules

<tool_usage>
- Prefer tools over internal knowledge for:
  - Fresh or user-specific data (tickets, orders, configs)
  - Specific IDs, URLs, or document references
- Parallelize independent reads when possible
- After write operations, restate: what changed, where, any validation performed
</tool_usage>

User Updates

<user_updates>
- Send brief updates (1-2 sentences) only when:
  - Starting a new major phase
  - Discovering something that changes the plan
- Avoid narrating routine operations
- Each update must include a concrete outcome ("Found X", "Updated Y")
- Do not expand scope beyond what was asked
</user_updates>

Self-Check for High-Risk Outputs

<self_check>
Before finalizing answers in sensitive contexts (legal, financial, safety):
- Re-scan for unstated assumptions
- Check for ungrounded numbers or claims
- Soften overly strong language ("always", "guaranteed")
- Explicitly state assumptions
</self_check>

Structured Extraction

For data extraction tasks, always provide a schema:

<extraction_spec>
Extract data into this exact schema (no extra fields):
{
  "field_name": "string",
  "optional_field": "string | null",
  "numeric_field": "number | null"
}
- If a field is not present in source, set to null (don't guess)
- Re-scan source for missed fields before returning
</extraction_spec>

Web Research Prompts

<research_guidelines>
- Browse the web for: time-sensitive topics, recommendations, navigational queries, ambiguous terms
- Include citations after paragraphs with web-derived claims
- Use multiple sources for key claims; prioritize primary sources
- Research until additional searching won't materially change the answer
- Structure output with Markdown: headers, bullets, tables for comparisons
</research_guidelines>

Example: Before/After

Without structure:

You're a financial analyst. Generate a Q2 report for investors. Include Revenue, Margins, Cash Flow. Use this data: {{DATA}}. Make it professional and concise.

With structure:

You're a financial analyst at AcmeCorp generating a Q2 report for investors.

<context>
AcmeCorp is a B2B SaaS company. Investors value transparency and actionable insights.
</context>

<data>
{{DATA}}
</data>

<instructions>
1. Include sections: Revenue Growth, Profit Margins, Cash Flow
2. Highlight strengths and areas for improvement
3. Use concise, professional tone
</instructions>

<output_format>
- Use bullet points with metrics and YoY changes
- Include "Action:" items for areas needing improvement
- End with 2-3 bullet Outlook section
</output_format>

Prompt Migration Checklist

When adapting prompts across models or versions:

Switch model, keep prompt identical — isolate the variable
Pin reasoning/thinking depth to match prior model's profile
Run evals — if results are good, ship
If regressions, tune prompt — adjust verbosity/format/scope constraints
Re-eval after each small change — one change at a time

Quick Reference

Technique	Tag Pattern	Use Case
Separate sections	`<context>`, `<instructions>`, `<data>`	Any complex prompt
Control length	`<output_spec>` with word/bullet limits	Prevent verbosity
Prevent drift	`<constraints>` with explicit "do NOT"	Feature creep
Handle uncertainty	`<uncertainty_handling>`	Factual queries
Chain of thought	`<thinking>`, `<answer>`	Reasoning tasks
Extraction	`<schema>` with JSON structure	Data parsing
Research	`<research_guidelines>`	Web-enabled agents
Self-check	`<self_check>`	High-risk domains
Tool usage	`<tool_usage_rules>`	Agentic systems
Eagerness control	`<persistence>`, `<context_gathering>`	Agent autonomy
Persona	`<role>` + behavioral constraints	Tone & style

Prompting Techniques Catalog

Comprehensive catalog of prompting techniques. Full details, examples, and academic references in references/prompting-techniques.md.

Technique	Use Case
Zero-Shot Prompting	Direct task execution without examples; classification, translation, summarization
Few-Shot Prompting	In-context learning via exemplars; format control, label calibration, style matching
Chain-of-Thought (CoT)	Step-by-step reasoning; arithmetic, logic, commonsense reasoning tasks
Meta Prompting	LLM as orchestrator delegating to specialized expert prompts; complex multi-domain tasks
Self-Consistency	Sample multiple CoT paths, pick majority answer; boost accuracy on math & reasoning
Generated Knowledge	Generate relevant knowledge first, then answer; commonsense & factual QA
Prompt Chaining	Break complex tasks into sequential subtasks; document analysis, multi-step workflows
Tree of Thoughts (ToT)	Explore multiple reasoning branches with lookahead/backtracking; planning, puzzles
RAG	Retrieve external documents before generating; knowledge-intensive tasks, fresh data
ART (Auto Reasoning + Tools)	Auto-select and orchestrate tools with CoT; tasks requiring calculation, search, APIs
APE (Auto Prompt Engineer)	LLM generates and scores candidate prompts; prompt optimization at scale
Active-Prompt	Identify uncertain examples, annotate selectively for CoT; adaptive few-shot
Directional Stimulus	Add a hint/keyword to guide generation direction; summarization, dialogue
PAL (Program-Aided LM)	Generate code instead of text for reasoning; math, data manipulation, symbolic tasks
ReAct	Interleave reasoning traces with tool actions; search, QA, decision-making agents
Reflexion	Agent self-reflects on failures with verbal feedback; iterative improvement, debugging
Multimodal CoT	Two-stage: rationale generation then answer with text+image; visual reasoning tasks
Graph Prompting	Structured graph-based prompts; node classification, relation extraction, graph tasks

Prompting Fundamentals

LLM settings, prompt elements, formatting, and practical examples — see references/prompting-introduction.md. Covers:

LLM Settings — temperature, top-p, max length, stop sequences, frequency/presence penalties
Prompt Elements — instruction, context, input data, output indicator
Design Tips — start simple, be specific, avoid impreciseness, say what TO do (not what NOT to do)
Task Examples — summarization, extraction, QA, classification, conversation, code generation, reasoning

Risks & Misuses

Adversarial attacks, factuality issues, and bias mitigation — see references/prompting-risks.md. Covers:

Adversarial Prompting — prompt injection, prompt leaking, jailbreaking (DAN, Waluigi Effect), defense tactics
Factuality — ground truth grounding, calibrated confidence, admit-ignorance patterns
Biases — exemplar distribution skew, exemplar ordering effects, balanced few-shot design

Prompt Audit / Review

When asked to audit, review, or improve a prompt, follow this workflow. Full checklist with per-check references: prompt-audit-checklist.md.

Workflow

Read the prompt fully — identify its purpose, target model, and deployment context (interactive chat, agentic system, batch pipeline, RAG-augmented)
Walk 8 dimensions — check each, note issues with severity (Critical / Warning / Suggestion):

#	Dimension	What to Check
1	Clarity & Specificity	Task definition, success criteria, audience, output format, conflicting constraints
2	Structure & Formatting	Section separation (XML tags), prompt smells (monolithic, mixed layers, negative bias)
3	Safety & Security	Control/data separation, secrets in prompt, injection resilience, tool permissions
4	Hallucination & Factuality	Role framing, grounding, citation-without-sources, uncertainty handling
5	Context Management	Info placement (not buried in middle), context size, RAG doc count, re-grounding
6	Maintainability & Debt	Hardcoded values, regenerated logic, model pinning, testability
7	Model-Specific Fit	Model-specific params and gotchas (see Model-Specific Guides below)
8	Evaluation Readiness	Eval criteria, adversarial test cases, schema enforcement, monitoring

Produce a report — issues table (dimension, check, severity, issue, fix) + rewritten prompt or targeted fix suggestions. Use the report template from the checklist reference.
For each issue, cite the relevant reference file so the user can dive deeper.

Quick Decision: Which Dimensions to Prioritize

User-facing chatbot → prioritize Safety (#3), Hallucination (#4), Clarity (#1)
Agentic system with tools → prioritize Safety (#3), Context (#5), Maintainability (#6)
Batch/pipeline → prioritize Structure (#2), Evaluation (#8), Maintainability (#6)
RAG-augmented → prioritize Context (#5), Safety (#3), Hallucination (#4)

Common Mistakes & Anti-Patterns

Three complementary layers — use the one matching your need:

Deep-dives by category — root causes, mechanisms, prevention checklists (from "The Architecture of Instruction", 2026):

Mistake Category	Key Issues	Reference
Hallucinations & Logic	Ambiguity-induced confabulation, automation bias, overloaded prompts, logical failures in verification tasks, no role framing	mistakes-hallucinations.md
Structural Fragility	Formatting sensitivity (up to 76pp variance), reproducibility crisis, prompt smells catalog (6 anti-patterns), deliberation ladder	mistakes-structure.md
Context Rot	"Lost in the middle" U-shaped attention, RAG over-retrieval, naive data loading, context engineering shift	mistakes-context.md
Prompt Debt	Token tax of regenerative code, debt taxonomy (prompt/hyperparameter/framework/cost), multi-agent solutions, automated repair	mistakes-debt.md
Security	Direct/indirect injection, jailbreaking, system prompt leakage (OWASP LLM07:2025), RAG poisoning, multimodal injection, adversarial suffixes	mistakes-security.md

Quick reference — 18-category taxonomy with MRPs, risk scores, case studies, action items: failure-taxonomy.md. Start here for an overview or to prioritize which categories to address first. Covers: control-plane vs data-plane model, heuristic risk scoring, real-world incidents (EchoLeak CVE-2025-32711, Mata v. Avianca, Samsung shadow AI).

How to measure & test — eval metrics, CI gating, red-teaming, tooling: evaluation-redteaming.md. Covers: TruthfulQA, FActScore, SelfCheckGPT, PromptBench, AILuminate, LLM-as-judge pitfalls, guardrail libraries, open research questions.

Model-Specific Guides

Each model family has unique parameters, gotchas, and patterns. Consult the reference for your target model:

Claude Family — Opus 4.7 / 4.6 / Sonnet 4.6 / 4.5 / Haiku 4.5: adaptive thinking (effort with new xhigh on 4.7), task_budget agentic-loop ceiling, legacy thinking.budget_tokens 400-error on 4.7, new tokenizer (~1.35× text, ~3× images), tool under-triggering on 4.7 (vs 4.6 over-triggering), more literal instruction-following, server-side compaction beta, Managed Agents memory beta, Cyber Verification gate, prefill deprecation, Structured Outputs, prompt caching, citations, context engineering, vision crop tool, migration paths 4.5 → 4.6 → 4.7
GPT-5 Family — GPT-5 / 5.1 / 5.2 / 5.4 / 5.5: reasoning_effort (last-mile knob in 5.4/5.5), text.verbosity, named tools (apply_patch), agentic eagerness templates, completeness/verification contracts, compaction API, phase field, outcome-first prompts, personality vs collaboration style, retrieval budgets, mini/nano guidance, migration paths
Gemini 3 Family — Gemini 2.5/3/3.1: temperature MUST be 1.0, thinking_budget vs thinking_level, constraint placement (end of prompt), persona priority, function calling, structured output, multimodal, image generation
GPT-5.2 Specifics — Compaction API code examples, web research agent prompt, full XML specification blocks