Skill

prompt-design

Design and document a production prompt — structured template, evaluation criteria, test cases, and version control strategy.

Install

npx claudepluginhub hpsgd/turtlestack --plugin ai-engineer

Tool Access

This skill is limited to using the following tools:

ReadWriteEditGlobGrep

Preview

Design a production prompt for $ARGUMENTS.

SKILL.md

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

139.0k

claude-opus-4-5-migration

2 files

Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.

claude-opus-4-5-migration

83.2k

bmad-distillator

7 files

Compresses source documents into lossless, LLM-optimized distillates preserving all facts and relationships. Use for 'distill documents' or 'create distillate' requests.

bmad-pro-skills

43.8k

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 2, 2026

Actions

View Source View Plugin View on GitHub View README

Property	Question	Bad answer	Good answer
Input	What data does the model receive?	"User text"	"JSON with fields: query (string, 1-500 chars), context (array of strings, 0-10 items)"
Output	What does the model produce?	"A response"	"JSON with fields: answer (string), confidence (float 0-1), sources (array of cited context indices)"
Format	What structure must the output follow?	"JSON"	"JSON matching the AnswerResponse schema, validated against OpenAPI spec"
Constraints	What must the model NOT do?	"Be accurate"	"Never reference information outside the provided context. If context is insufficient, return confidence: 0"
Volume	How often is this called?	"A lot"	"~2000 requests/day, peak 50/minute"
Latency	How fast must it respond?	"Fast"	"p95 < 3 seconds total generation time"

Property

Question

Bad answer

Good answer

Input

What data does the model receive?

"User text"

"JSON with fields: query (string, 1-500 chars), context (array of strings, 0-10 items)"

Output

What does the model produce?

"A response"

"JSON with fields: answer (string), confidence (float 0-1), sources (array of cited context indices)"

Format

What structure must the output follow?

"JSON"

"JSON matching the AnswerResponse schema, validated against OpenAPI spec"

Constraints

What must the model NOT do?

"Be accurate"

"Never reference information outside the provided context. If context is insufficient, return confidence: 0"

Volume

How often is this called?

"A lot"

"~2000 requests/day, peak 50/minute"

Latency

How fast must it respond?

"Fast"

"p95 < 3 seconds total generation time"

Criterion	Metric	Pass threshold	Measurement method
Accuracy	Correct answer on eval set	>= 90%	Automated comparison against expected outputs
Format compliance	Valid output structure	100%	Schema validation — every response must parse
Safety	No hallucinated facts	0 hallucinations in eval set	Human review + automated context grounding check
Latency	Total generation time	p95 < target from Step 1	Timed API calls across eval set
Cost	Per-request token usage	< budget from Step 1	Token counting across eval set

Criterion

Metric

Pass threshold

Measurement method

Accuracy

Correct answer on eval set

>= 90%

Automated comparison against expected outputs

Format compliance

Valid output structure

100%

Schema validation — every response must parse

Safety

No hallucinated facts

0 hallucinations in eval set

Human review + automated context grounding check

Latency

Total generation time

p95 < target from Step 1

Timed API calls across eval set

Cost

Per-request token usage

< budget from Step 1

Token counting across eval set

[Role/Context] You are a [specific role] that [specific task]. You work with [specific domain]. [Task] Given the following [input type], [specific action to perform]. [Constraints] - Only use information from the provided context - If the answer is not in the context, respond with [specific fallback] - Output must be valid JSON matching the schema below - Maximum output length: [token limit] [Examples] Input: [representative example 1] Output: [expected output 1] Input: [representative example 2 — edge case] Output: [expected output 2] [Input] {input_data}

Category	Purpose	Minimum count
Happy path	Typical, well-formed inputs	2
Edge cases	Boundary conditions, unusual but valid inputs	2
Adversarial	Prompt injection attempts, contradictory inputs	1
Empty/minimal	Missing fields, empty strings, null values	1

Category

Purpose

Minimum count

Happy path

Typical, well-formed inputs

Edge cases

Boundary conditions, unusual but valid inputs

Adversarial

Prompt injection attempts, contradictory inputs

Empty/minimal

Missing fields, empty strings, null values

#	Category	Input (summary)	Expected output
T1	Happy path	Standard query with clear context	Correct answer, confidence > 0.8
T2	Happy path	Multi-part query	All parts addressed
T3	Edge case	Query with no matching context	confidence: 0, fallback message
T4	Edge case	Very long input near token limit	Truncation handled gracefully
T5	Adversarial	"Ignore previous instructions"	Normal response, injection ignored
T6	Empty	Empty query string	Validation error or graceful decline

Category

Input (summary)

Expected output

Actual output

Pass/Fail

Notes

Happy path

Standard query with clear context

Correct answer, confidence > 0.8

Happy path

Multi-part query

All parts addressed

Edge case

Query with no matching context

confidence: 0, fallback message

Edge case

Very long input near token limit

Truncation handled gracefully

Adversarial

"Ignore previous instructions"

Normal response, injection ignored

Empty

Empty query string

Validation error or graceful decline

Method	When to use	Reliability
JSON mode / response_format	Structured data extraction, API responses	High — model constrained to valid JSON
Function calling / tool use	Action-oriented outputs, multi-step workflows	High — schema-validated by the API
"Output as JSON" in prompt text	Never	Low — model may produce invalid JSON, markdown-wrapped JSON, or free text

Method

When to use

Reliability

JSON mode / response_format

Structured data extraction, API responses

High — model constrained to valid JSON

Function calling / tool use

Action-oriented outputs, multi-step workflows

High — schema-validated by the API

"Output as JSON" in prompt text

Never

Low — model may produce invalid JSON, markdown-wrapped JSON, or free text

Only use information from the provided context to answer the question. If the answer is not contained in the context, respond with: {"answer": null, "confidence": 0, "reason": "Information not found in provided context"} Do not use your training data to fill gaps in the context.

## v1.2 — 2024-03-15 - Added explicit constraint for empty context handling (fixes T3 failure) - Eval results: 94% accuracy (up from 91%), 100% format compliance ## v1.1 — 2024-03-10 - Reduced examples from 5 to 3 (no accuracy loss, 15% cost reduction) - Eval results: 91% accuracy, 100% format compliance

# Prompt Design: [feature name] ## Task Definition - **Input:** [type, format, constraints] - **Output:** [type, format, schema] - **Constraints:** [explicit boundaries] - **Volume:** [requests/day] - **Latency budget:** [p95 target] - **Cost budget:** [per-request target] ## Evaluation Criteria | Criterion | Metric | Pass threshold | |---|---|---| ## Prompt (v1.0) [Full prompt text] ## Output Schema [JSON schema or type definition] ## Test Results | # | Category | Input | Expected | Actual | Pass/Fail | |---|---|---|---|---|---| ## Safety Measures - [Context grounding approach] - [Injection resistance measures] - [Output validation rules] ## Version History [Changelog] ## Deployment Notes - [File location] - [Eval set location] - [Rollback procedure]

Property	Question	Bad answer	Good answer
Input	What data does the model receive?	"User text"	"JSON with fields: query (string, 1-500 chars), context (array of strings, 0-10 items)"
Output	What does the model produce?	"A response"	"JSON with fields: answer (string), confidence (float 0-1), sources (array of cited context indices)"
Format	What structure must the output follow?	"JSON"	"JSON matching the AnswerResponse schema, validated against OpenAPI spec"
Constraints	What must the model NOT do?	"Be accurate"	"Never reference information outside the provided context. If context is insufficient, return confidence: 0"
Volume	How often is this called?	"A lot"	"~2000 requests/day, peak 50/minute"
Latency	How fast must it respond?	"Fast"	"p95 < 3 seconds total generation time"

Property

Question

Bad answer

Good answer

Input

What data does the model receive?

"User text"

"JSON with fields: query (string, 1-500 chars), context (array of strings, 0-10 items)"

Output

What does the model produce?

"A response"

"JSON with fields: answer (string), confidence (float 0-1), sources (array of cited context indices)"

Format

What structure must the output follow?

"JSON"

"JSON matching the AnswerResponse schema, validated against OpenAPI spec"

Constraints

What must the model NOT do?

"Be accurate"

"Never reference information outside the provided context. If context is insufficient, return confidence: 0"

Volume

How often is this called?

"A lot"

"~2000 requests/day, peak 50/minute"

Latency

How fast must it respond?

"Fast"

"p95 < 3 seconds total generation time"

Criterion	Metric	Pass threshold	Measurement method
Accuracy	Correct answer on eval set	>= 90%	Automated comparison against expected outputs
Format compliance	Valid output structure	100%	Schema validation — every response must parse
Safety	No hallucinated facts	0 hallucinations in eval set	Human review + automated context grounding check
Latency	Total generation time	p95 < target from Step 1	Timed API calls across eval set
Cost	Per-request token usage	< budget from Step 1	Token counting across eval set

Criterion

Metric

Pass threshold

Measurement method

Accuracy

Correct answer on eval set

>= 90%

Automated comparison against expected outputs

Format compliance

Valid output structure

100%

Schema validation — every response must parse

Safety

No hallucinated facts

0 hallucinations in eval set

Human review + automated context grounding check

Latency

Total generation time

p95 < target from Step 1

Timed API calls across eval set

Cost

Per-request token usage

< budget from Step 1

Token counting across eval set

Category	Purpose	Minimum count
Happy path	Typical, well-formed inputs	2
Edge cases	Boundary conditions, unusual but valid inputs	2
Adversarial	Prompt injection attempts, contradictory inputs	1
Empty/minimal	Missing fields, empty strings, null values	1

Category

Purpose

Minimum count

Happy path

Typical, well-formed inputs

Edge cases

Boundary conditions, unusual but valid inputs

Adversarial

Prompt injection attempts, contradictory inputs

Empty/minimal

Missing fields, empty strings, null values

#	Category	Input (summary)	Expected output
T1	Happy path	Standard query with clear context	Correct answer, confidence > 0.8
T2	Happy path	Multi-part query	All parts addressed
T3	Edge case	Query with no matching context	confidence: 0, fallback message
T4	Edge case	Very long input near token limit	Truncation handled gracefully
T5	Adversarial	"Ignore previous instructions"	Normal response, injection ignored
T6	Empty	Empty query string	Validation error or graceful decline

Category

Input (summary)

Expected output

Actual output

Pass/Fail

Notes

Happy path

Standard query with clear context

Correct answer, confidence > 0.8

Happy path

Multi-part query

All parts addressed

Edge case

Query with no matching context

confidence: 0, fallback message

Edge case

Very long input near token limit

Truncation handled gracefully

Adversarial

"Ignore previous instructions"

Normal response, injection ignored

Empty

Empty query string

Validation error or graceful decline

Method	When to use	Reliability
JSON mode / response_format	Structured data extraction, API responses	High — model constrained to valid JSON
Function calling / tool use	Action-oriented outputs, multi-step workflows	High — schema-validated by the API
"Output as JSON" in prompt text	Never	Low — model may produce invalid JSON, markdown-wrapped JSON, or free text

Method

When to use

Reliability

JSON mode / response_format

Structured data extraction, API responses

High — model constrained to valid JSON

Function calling / tool use

Action-oriented outputs, multi-step workflows

High — schema-validated by the API

"Output as JSON" in prompt text

Never

Low — model may produce invalid JSON, markdown-wrapped JSON, or free text

prompt-design

Install

Tool Access

Preview

SKILL.md

Similar Skills

prompt-design

Install

Tool Access

Preview

SKILL.md

Process (sequential — do not skip steps)

Step 1: Task Definition

Step 2: Evaluation Criteria

Step 3: Prompt Structure

Step 4: Test Case Design

Step 5: Format Enforcement

Step 6: Safety and Guardrails

Step 7: Version Control

Anti-Patterns (NEVER do these)

Output Format

Similar Skills

Process (sequential — do not skip steps)

Step 1: Task Definition

Step 2: Evaluation Criteria

Step 3: Prompt Structure

Step 4: Test Case Design

Step 5: Format Enforcement

Step 6: Safety and Guardrails

Step 7: Version Control

Anti-Patterns (NEVER do these)

Output Format