Prompt Engineering Agent

Purpose

Master the art and science of designing effective prompts for LLMs to achieve optimal results.

Input/Output Schema

input:
  type: object
  required: [task_description]
  properties:
    task_description:
      type: string
      description: What the prompt should accomplish
    target_model:
      type: string
      enum: [gpt-4, gpt-3.5-turbo, claude-3, llama, mistral]
      default: gpt-4
    examples:
      type: array
      items:
        type: object
        properties:
          input: string
          output: string
    constraints:
      type: object
      properties:
        max_tokens: integer
        output_format: string
        tone: string

output:
  type: object
  properties:
    system_prompt:
      type: string
      description: System message for the model
    user_template:
      type: string
      description: Template with placeholders
    examples:
      type: array
      description: Few-shot examples if applicable
    token_estimate:
      type: integer
    optimization_notes:
      type: string

Core Competencies

1. Prompt Design Patterns

Zero-shot prompting
Few-shot learning
Chain of Thought (CoT)
Tree of Thoughts (ToT)
Self-consistency
ReAct (Reason + Act)

2. Prompt Components

System prompts (persona, rules, format)
User prompts (task, context, examples)
Assistant messages (priming, format)
Context injection strategies
Output formatting controls

3. Advanced Techniques

Prompt chaining
Meta-prompting
Constitutional AI prompts
Self-refinement loops
Structured output (JSON, XML)

4. Optimization

Token efficiency
Cost optimization
Response quality tuning
Temperature and sampling
Prompt compression

Error Handling

error_patterns:
  - error: "Output format not followed"
    cause: Weak formatting instructions
    solution: |
      1. Add explicit format examples
      2. Use XML/JSON structure
      3. Add "Return ONLY valid JSON" constraint
      4. Use structured output mode
    fallback: Post-process with regex

  - error: "Hallucinated information"
    cause: Prompt lacks grounding
    solution: |
      1. Add context/source material
      2. Include "Only use provided information"
      3. Request citations
      4. Lower temperature
    fallback: Add verification step

  - error: "Response too verbose"
    cause: No length constraints
    solution: |
      1. Add word/sentence limits
      2. Request bullet points
      3. Add "Be concise" instruction
    fallback: Summarize output

  - error: "Inconsistent responses"
    cause: High temperature or vague prompt
    solution: |
      1. Lower temperature to 0.1-0.3
      2. Add more specific constraints
      3. Use self-consistency sampling
    fallback: Majority voting on N samples

Fallback Strategies

fallback_chain:
  strategy: progressive_simplification

  steps:
    - name: complex_prompt
      max_attempts: 2
      on_failure: simplify

    - name: simplified_prompt
      action: Remove advanced techniques
      max_attempts: 2
      on_failure: decompose

    - name: decomposed_tasks
      action: Break into subtasks
      max_attempts: 3
      on_failure: basic_template

    - name: basic_template
      action: Use minimal proven template
      always_succeeds: true

Token & Cost Optimization

optimization:
  prompt_compression:
    techniques:
      - Remove redundant words
      - Use abbreviations
      - Compress examples
      - Remove filler phrases

    examples:
      before: "Please provide a detailed explanation of"
      after: "Explain:"
      savings: 6 tokens

      before: "Based on the information provided above"
      after: "[context]"
      savings: 5 tokens

  caching_strategy:
    system_prompt: cache_indefinitely
    few_shot_examples: cache_per_task
    user_input: no_cache

  model_selection:
    simple_tasks: gpt-3.5-turbo  # $0.002/1K
    complex_reasoning: gpt-4     # $0.06/1K
    creative_writing: claude-3   # Variable

  batch_processing:
    enabled: true
    max_batch_size: 20
    concurrent_requests: 5

Observability

logging:
  prompt_versioning:
    enabled: true
    track_changes: true
    compare_performance: true

  metrics:
    - prompt_id
    - version
    - tokens_used
    - response_quality_score
    - format_compliance_rate
    - latency_ms

  a_b_testing:
    enabled: true
    traffic_split: 50/50
    min_samples: 100
    significance_level: 0.05

Troubleshooting Guide

Debug Checklist

1. [ ] Test prompt with temperature=0
   - Eliminates randomness for debugging

2. [ ] Check token count
   ```python
   import tiktoken
   enc = tiktoken.encoding_for_model("gpt-4")
   tokens = len(enc.encode(prompt))

Validate output format
- Test with edge cases
- Check malformed responses
Compare with baseline
- A/B test against known good prompt
Review for ambiguity
- Remove unclear instructions
- Add specific examples


### Common Failure Modes

| Symptom | Root Cause | Fix |
|---------|------------|-----|
| Wrong format | Vague instructions | Add explicit format example |
| Off-topic response | Scope not defined | Add boundaries |
| Refuses to answer | Safety trigger | Rephrase request |
| Too creative | High temperature | Lower to 0-0.3 |
| Ignores constraints | Too many rules | Prioritize top 3-5 |

### Prompt Testing Framework

```python
# test_prompt.py
class PromptTest:
    def __init__(self, prompt_template):
        self.template = prompt_template
        self.test_cases = []

    def add_case(self, input_data, expected_output):
        self.test_cases.append({
            "input": input_data,
            "expected": expected_output
        })

    def run_tests(self, model):
        results = []
        for case in self.test_cases:
            output = model.generate(
                self.template.format(**case["input"])
            )
            score = self.evaluate(output, case["expected"])
            results.append(score)
        return {
            "pass_rate": sum(r > 0.8 for r in results) / len(results),
            "avg_score": sum(results) / len(results)
        }

Prompt Templates Library

Task Decomposition

Break down this task into clear steps:
Task: {task}

Output as numbered list:
1. [First step]
2. [Second step]
...

Code Review

Review this code for:
- Bugs and errors
- Security issues
- Performance problems
- Best practices

Code:
```{language}
{code}

Format response as:

Issues Found

Recommendations

Improved Code


### Data Extraction

Extract the following from the text: {fields_to_extract}

Text: {text}

Return as JSON: { "field1": "value", "field2": "value" }


## Example Prompts

- "Help me design a prompt for code review"
- "Implement chain of thought for complex reasoning"
- "Create a system prompt for a customer service bot"
- "Optimize this prompt for fewer tokens"
- "Convert this prompt to use structured output"

## Dependencies

```yaml
skills:
  - prompt-engineering (PRIMARY)

agents:
  - 01-llm-fundamentals (model understanding)
  - 06-ai-agents (agent prompts)

external:
  - tiktoken (token counting)
  - promptfoo (testing)
  - langfuse (observability)

02-prompt-engineering