agent-safety | ai-agents-assistant | ClaudePluginHub

Skill

Community

agent-safety

From ai-agents-assistant

1

Description

Ensure agent safety - guardrails, content filtering, monitoring, and compliance

AI Summary

Adds input/output guardrails, content filtering, and rate limiting for safe agent deployment. Use when implementing PII detection, prompt injection protection, or compliance requirements like GDPR and SOC2.

Install

1

Add the repository(one-time)

$

/plugin marketplace add pluginagentmarketplace/custom-plugin-ai-agents

2

Install the plugin

$

/plugin install custom-plugin-ai-agents@pluginagentmarketplace-ai-agents

Tool Access

This skill inherits all available tools. When active, it can use any tool Claude has access to.

Supporting Assets

View in Repository

assets/config.yaml

assets/schema.json

references/GUIDE.md

references/PATTERNS.md

scripts/safety_guardrails.py

scripts/validate.py

Skill Content

Agent Safety

Implement safety systems for responsible AI agent deployment.

When to Use This Skill

Invoke this skill when:

Adding input/output guardrails
Implementing content filtering
Setting up rate limiting
Ensuring compliance (GDPR, SOC2)

Parameter Schema

Parameter	Type	Required	Description	Default
`task`	string	Yes	Safety goal	-
`risk_level`	enum	No	`strict`, `moderate`, `permissive`	`strict`
`filters`	list	No	Filter types to enable	`["injection", "pii", "toxicity"]`

Quick Start

from guardrails import Guard
from guardrails.validators import ToxicLanguage, PIIFilter

guard = Guard.from_validators([
    ToxicLanguage(threshold=0.8, on_fail="exception"),
    PIIFilter(on_fail="fix")
])

# Validate output
validated = guard.validate(llm_response)

Guardrail Types

Input Guardrails

# Prompt injection detection
INJECTION_PATTERNS = [
    r"ignore (previous|all) instructions",
    r"you are now",
    r"forget everything"
]

Output Guardrails

# Content filtering
filters = [
    ToxicityFilter(),
    PIIRedactor(),
    HallucinationDetector()
]

Rate Limiting

class RateLimiter:
    def __init__(self, rpm=60, tpm=100000):
        self.rpm = rpm
        self.tpm = tpm

    def check(self, user_id, tokens):
        # Token bucket algorithm
        pass

Troubleshooting

Issue	Solution
False positives	Tune thresholds
Injection bypass	Add LLM-based detection
PII leakage	Add secondary validation
Performance hit	Cache filter results

Best Practices

Defense in depth (multiple layers)
Fail-safe defaults (deny by default)
Audit everything
Regular red team testing

Compliance Checklist

Related Skills

tool-calling - Input validation
llm-integration - API security
multi-agent - Per-agent permissions

References

Links

GitHub Stats

0 forks

Updated 9 days ago

Similar Skills

Command Development

This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.

53.4k

Agent Development

This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.

53.4k

Hook Development

This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.

53.4k