Skill

AI Guardrails

Evaluate AI prompts and conversations against Trend Micro Vision One AI security policies. This skill helps detect and block harmful content, prompt injection attacks, sensitive data exposure, and other AI-specific threats in LLM applications.

npx claudepluginhub trendmicro/vision-one-skills --plugin vision-one-api

Tool Access

This skill uses the workspace's default tool permissions.

Preview

SKILL.md

Similar Skills

android-clean-architecture

171.0k

Implements Clean Architecture in Android and Kotlin Multiplatform projects: module layouts, dependency rules, UseCases, Repositories, domain models, and data layers with Room, SQLDelight, Ktor.

everything-claude-code

plankton-code-quality

171.0k

Enforces code quality on file edits via Plankton hooks: auto-formats, lints, Claude-powered fixes with model tiering, config protection, and legacy package manager blocks.

everything-claude-code

cpp-coding-standards

171.0k

Enforces C++ Core Guidelines for writing, reviewing, and refactoring modern C++ code (C++17+), promoting RAII, immutability, type safety, and idiomatic practices.

everything-claude-code

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitFeb 3, 2026

Actions

View Source View Plugin View on GitHub View README

AI Guardrails

Instructions

When the user wants to evaluate prompts, check for harmful content, or validate AI inputs/outputs, use this skill to apply AI guardrails.
Identify the request type: Determine whether you're evaluating:
- A simple text prompt (SimpleRequestGuard)
- An OpenAI chat completion request (OpenAIChatCompletionRequestV1)
- An OpenAI chat completion response (OpenAIChatCompletionResponseV1)
Provide application context: Always specify the applicationName parameter to identify which AI application's prompts are being evaluated.
Choose response detail level: Use the prefer parameter to control output verbosity:
- return=representation - Full evaluation with harmful content details, sensitive info, and prompt attack analysis
- return=minimal - Concise response with just action and reasons
Interpret results: The tool returns:
- Action: Allow or Block
- Reasons: Explanation for any policy violations detected
Handle blocked content: When content is blocked, explain which policies were violated and suggest alternatives.

Tools

This skill uses the following Vision One MCP tools:

Tool	Purpose
`aisecurity_guardrails_apply`	Evaluate prompts against AI guard policies and return Allow/Block recommendations

Request Types

SimpleRequestGuard

Use for evaluating a single text prompt (max 1024 characters):

{
  "applicationName": "my-ai-app",
  "requestType": "SimpleRequestGuard",
  "prompt": "User's prompt text here",
  "prefer": "return=representation"
}

OpenAIChatCompletionRequestV1

Use for evaluating OpenAI-style chat messages:

{
  "applicationName": "my-ai-app",
  "requestType": "OpenAIChatCompletionRequestV1",
  "model": "gpt-4",
  "messages": [
    {"role": "system", "content": "System prompt"},
    {"role": "user", "content": "User message"},
    {"role": "assistant", "content": "Assistant response"}
  ],
  "prefer": "return=representation"
}

OpenAIChatCompletionResponseV1

Use for evaluating AI-generated responses before returning to users.

Common Workflows

Prompt Validation

Receive user prompt for evaluation
Call aisecurity_guardrails_apply with SimpleRequestGuard request type
Check the action (Allow/Block)
If blocked, report policy violations to the user
If allowed, confirm the prompt passed security checks

Chat Conversation Evaluation

Collect the full conversation history (system, user, assistant messages)
Call aisecurity_guardrails_apply with OpenAIChatCompletionRequestV1 request type
Analyze the response for any detected threats
Report findings including harmful content, sensitive data, or prompt attacks

Security Testing

Prepare test prompts including edge cases
Evaluate each prompt against guardrails
Document which prompts are blocked and why
Verify policies are correctly configured

Response Filtering

Capture AI-generated response before delivery
Evaluate response with OpenAIChatCompletionResponseV1 request type
Block responses containing harmful or sensitive content
Allow safe responses to pass through

Output Format

When presenting guardrail evaluation results:

## AI Guardrails Evaluation

**Application**: [Application Name]
**Request Type**: [SimpleRequestGuard/OpenAIChatCompletionRequestV1/OpenAIChatCompletionResponseV1]

### Decision
**Action**: [Allow/Block]

### Policy Evaluation
[If blocked or issues detected:]
- **Harmful Content**: [Detected/Not Detected] - [Details]
- **Sensitive Information**: [Detected/Not Detected] - [Details]
- **Prompt Attacks**: [Detected/Not Detected] - [Details]

### Reasons
[List of reasons for the decision]

### Recommendations
[Suggested actions if content was blocked]

Detection Categories

The AI guardrails evaluate content for:

Category	Description
Harmful Content	Violence, hate speech, self-harm, illegal activities
Sensitive Information	PII, credentials, financial data, health records
Prompt Attacks	Injection attempts, jailbreaks, role manipulation
Policy Violations	Custom organization-specific policy breaches

Security Considerations

This skill helps protect AI applications from misuse and data leakage
Guardrail policies should be configured in Vision One console before use
Results depend on the policies configured for your organization
Use detailed responses (return=representation) during development and testing
Use minimal responses (return=minimal) in production for efficiency
Application names should be consistent to enable proper tracking and analytics
Blocked content should be logged for security monitoring and policy refinement
Regular testing with adversarial prompts helps validate guardrail effectiveness