From example-skills
Design effective prompts for LLM agents with structured input/output formats, chain-of-thought reasoning, few-shot examples, and system prompt architecture. Covers Claude-specific patterns and multi-turn conversation design. Triggers on prompt design, LLM interaction patterns, or system prompt architecture requests.
npx claudepluginhub organvm-iv-taxis/a-i--skills --plugin document-skillsThis skill uses the workspace's default tool permissions.
Design prompts that produce reliable, structured, high-quality outputs from language models.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Designs, implements, and audits WCAG 2.2 AA accessible UIs for Web (ARIA/HTML5), iOS (SwiftUI traits), and Android (Compose semantics). Audits code for compliance gaps.
Design prompts that produce reliable, structured, high-quality outputs from language models.
┌─ Identity & Role ─────────────────┐
│ Who the model is, what it does │
├─ Context & Constraints ───────────┤
│ Domain knowledge, guardrails │
├─ Output Format ───────────────────┤
│ Structure, length, style │
├─ Examples (Few-Shot) ─────────────┤
│ Input/output pairs │
├─ Instructions ────────────────────┤
│ Step-by-step task guidance │
└────────────────────────────────────┘
When instructions conflict, models follow this precedence:
<system>
Analyze the given code and return findings in this exact format:
<analysis>
<summary>One-sentence overall assessment</summary>
<findings>
<finding severity="high|medium|low">
<location>file:line</location>
<issue>Description</issue>
<fix>Recommended fix</fix>
</finding>
</findings>
<score>1-10</score>
</analysis>
</system>
Before answering, think through the problem step by step:
1. Identify the core question
2. List relevant constraints
3. Consider 2-3 approaches
4. Evaluate tradeoffs
5. Recommend the best approach with reasoning
Show your reasoning in <thinking> tags, then give your final answer.
Classify the following commit messages by type.
Examples:
- "Add user authentication with JWT" → feat
- "Fix null pointer in dashboard render" → fix
- "Update README with API documentation" → docs
- "Refactor database connection pooling" → refactor
Now classify:
- "Implement rate limiting for API endpoints" →
You are a senior security engineer reviewing code for a financial services application.
Your priorities are:
1. Authentication and authorization flaws
2. Data exposure risks
3. Input validation gaps
4. Dependency vulnerabilities
Review with the paranoia appropriate for systems handling financial data.
Generate a Python function with these constraints:
- No external dependencies (stdlib only)
- Must handle the empty input case
- Must include type hints
- Maximum 20 lines
- Must include a docstring
Break complex tasks into sequential sub-prompts:
Step 1: Analyze the current code structure
Step 2: Identify the specific change needed
Step 3: Write the minimal diff
Step 4: Verify the change doesn't break existing behavior
After generating your response:
1. Re-read the original question
2. Check that every requirement is addressed
3. Verify any code compiles/runs mentally
4. Flag any assumptions you made
Specify what NOT to do:
Important:
- Do NOT add error handling beyond what was requested
- Do NOT refactor surrounding code
- Do NOT add comments explaining obvious operations
- Do NOT change the function signature
Claude responds well to XML-tagged sections:
<context>
Repository: a-i--skills
Organ: IV (Orchestration)
Current branch: feature/governance-aware-skill-taxonomy
</context>
<task>
Create a new skill following the existing frontmatter format.
</task>
<constraints>
- Match the YAML frontmatter schema exactly
- Name must match directory name
- Include governance metadata fields
</constraints>
For complex reasoning tasks, allocate thinking budget:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000,
},
messages=[{"role": "user", "content": prompt}],
)
Define tools for structured interaction:
tools = [{
"name": "create_skill",
"description": "Create a new skill file",
"input_schema": {
"type": "object",
"required": ["name", "category", "description"],
"properties": {
"name": {"type": "string", "pattern": "^[a-z][a-z0-9-]*$"},
"category": {"type": "string"},
"description": {"type": "string", "maxLength": 600},
},
},
}]
Conversation budget allocation:
- System prompt: ~2K tokens (fixed)
- Conversation history: ~50K tokens (growing)
- Current task context: ~10K tokens (variable)
- Response space: ~4K tokens (reserved)
When context grows large, summarize earlier turns:
<conversation_summary>
In previous messages, we:
1. Identified the bug in auth middleware (missing token refresh)
2. Agreed on fix approach (add refresh check before expiry)
3. Implemented the fix in src/auth/middleware.ts
</conversation_summary>
Now continuing with testing...
| Criterion | Test Method |
|---|---|
| Correctness | Compare output against known-good answers |
| Consistency | Run same prompt 5x, check variance |
| Format compliance | Validate output structure programmatically |
| Edge cases | Test with empty input, long input, adversarial input |
| Robustness | Rephrase prompt, check output stability |
async def evaluate_prompts(prompts: list[str], test_cases: list[dict]) -> dict:
results = {}
for i, prompt in enumerate(prompts):
scores = []
for case in test_cases:
output = await generate(prompt, case["input"])
score = evaluate(output, case["expected"])
scores.append(score)
results[f"prompt_{i}"] = sum(scores) / len(scores)
return results