From agentic-skills
A defensive pattern where inputs and outputs are inspected by dedicated safety agents or rules to preventing malicious use, jailbreaks, and harmful content. Use when user asks to "add safety checks", "set up guardrails", "prevent harmful outputs", or mentions agent boundaries, output validation, or content filtering.
npx claudepluginhub lauraflorentin/skills-marketplace --plugin agentic-skillsThis skill uses the workspace's default tool permissions.
Guardrails are the firewall of an AI system. They sit between the user and the agent (Input Guardrail) and between the agent and the user (Output Guardrail). They enforce policy, security, and tone. Unlike the main agent, which tries to be helpful, the guardrail tries to be safe and compliant.
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Migrates code, prompts, and API calls from Claude Sonnet 4.0/4.5 or Opus 4.1 to Opus 4.5, updating model strings on Anthropic, AWS, GCP, Azure platforms.
Reviews prose for communication issues impeding comprehension, outputs minimal fixes in a three-column table per Microsoft Writing Style Guide. Useful for 'review prose' or 'improve prose' requests.
Guardrails are the firewall of an AI system. They sit between the user and the agent (Input Guardrail) and between the agent and the user (Output Guardrail). They enforce policy, security, and tone. Unlike the main agent, which tries to be helpful, the guardrail tries to be safe and compliant.
def guarded_execution(user_input):
# Layer 1: Input Guardrail
# Check for prompt injection or policy violations
if not safety_agent.check_input(user_input).safe:
return "I cannot answer that request."
# Layer 2: Main Execution
response = main_agent.run(user_input)
# Layer 3: Output Guardrail
# Check for PII or harmful content in the response
if not safety_agent.check_output(response).safe:
log_violation(user_input, response)
return "Response withheld due to safety policy."
return response
| Problem | Cause | Fix |
|---|---|---|
| Guardrail blocks legitimate requests | Over-broad pattern matching | Tune guardrail thresholds using a labeled test set; track false positive rate |
| Agent bypasses guardrails | Prompt injection in user input | Apply guardrails before injecting user content into agent context |
| Guardrail adds too much latency | Synchronous pre-call check | Run guardrail in parallel with the first LLM call; cancel if flagged |
| Silent failures | Guardrail raises exception but agent continues | Treat guardrail exceptions as hard stops; log and escalate |