Help us improve
Share bugs, ideas, or general feedback.
From prodsec-skills
Deploys runtime guardrails for bidirectional prompt and response filtering in AI systems. Use when designing or reviewing AI architectures needing prompt injection protection, content filtering, or input/output safety controls.
npx claudepluginhub redhatproductsecurity/prodsec-skills --plugin prodsec-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/prodsec-skills:bidirectional-filteringThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
A guardrails component SHOULD be deployed between the users/applications (or API gateway) and the models. This component acts as a gateway or proxy that inspects and acts on data flowing in **both directions**.
Security techniques and quality control for prompts and agents
Builds input/output validation guardrails for LLM apps using NeMo Guardrails Colang and custom Python validators to prevent prompt injection, data leakage, toxic content, and hallucinations.
Implements input/output guardrails for LLM apps using NeMo Guardrails Colang, Python PII/toxicity validators, and Guardrails AI to block prompt injection, data leaks, toxic content, hallucinations, and ensure JSON schema compliance. For AI safety in chatbots, RAG pipelines.
Share bugs, ideas, or general feedback.
A guardrails component SHOULD be deployed between the users/applications (or API gateway) and the models. This component acts as a gateway or proxy that inspects and acts on data flowing in both directions.
This skill refers to runtime guardrails (a deployed component), not model-level safety training.
Incoming prompts are raw or "tainted" input. The guardrails component analyzes them and applies rule-based actions:
| Action | Description |
|---|---|
| Block | Discard the prompt entirely, preventing it from reaching the model |
| Mask | Redact or obfuscate sensitive data (PII, credentials) before forwarding |
| Modify | Rewrite the prompt to remove dangerous patterns while preserving intent |
| Pass | Allow the prompt through unchanged |
Objectives:
Model responses are inspected before delivery to the user or application:
| Action | Description |
|---|---|
| Block | Suppress the response if it contains harmful or policy-violating content |
| Mask | Redact sensitive data the model may have included in its response |
| Modify | Remove or rewrite problematic portions of the response |
| Pass | Deliver the response unchanged |
Objectives:
User/App → API Gateway → Guardrails → Inference Engine → Model
↕ (inspects both directions)
User/App ← API Gateway ← Guardrails ← Inference Engine ← Model