Ensure agent safety - guardrails, content filtering, monitoring, and compliance
Adds input/output guardrails, content filtering, and rate limiting for safe agent deployment. Use when implementing PII detection, prompt injection protection, or compliance requirements like GDPR and SOC2.
/plugin marketplace add pluginagentmarketplace/custom-plugin-ai-agents/plugin install custom-plugin-ai-agents@pluginagentmarketplace-ai-agentsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/config.yamlassets/schema.jsonreferences/GUIDE.mdreferences/PATTERNS.mdscripts/safety_guardrails.pyscripts/validate.pyImplement safety systems for responsible AI agent deployment.
Invoke this skill when:
| Parameter | Type | Required | Description | Default |
|---|---|---|---|---|
task | string | Yes | Safety goal | - |
risk_level | enum | No | strict, moderate, permissive | strict |
filters | list | No | Filter types to enable | ["injection", "pii", "toxicity"] |
from guardrails import Guard
from guardrails.validators import ToxicLanguage, PIIFilter
guard = Guard.from_validators([
ToxicLanguage(threshold=0.8, on_fail="exception"),
PIIFilter(on_fail="fix")
])
# Validate output
validated = guard.validate(llm_response)
# Prompt injection detection
INJECTION_PATTERNS = [
r"ignore (previous|all) instructions",
r"you are now",
r"forget everything"
]
# Content filtering
filters = [
ToxicityFilter(),
PIIRedactor(),
HallucinationDetector()
]
class RateLimiter:
def __init__(self, rpm=60, tpm=100000):
self.rpm = rpm
self.tpm = tpm
def check(self, user_id, tokens):
# Token bucket algorithm
pass
| Issue | Solution |
|---|---|
| False positives | Tune thresholds |
| Injection bypass | Add LLM-based detection |
| PII leakage | Add secondary validation |
| Performance hit | Cache filter results |
tool-calling - Input validationllm-integration - API securitymulti-agent - Per-agent permissionsThis skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.