Skip to main content

/

/

Stats

Actions

Tags

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Tags

Help us improve

Share bugs, ideas, or general feedback.

ClaudePluginHub

Community directory for discovering and installing Claude Code plugins.

Find plugins for your project

AI-powered recommendations based on your stack.

Product

Browse Plugins
Marketplaces
Pricing
About
Contact

Resources

Learning Center
Blog
Weekly Digest
Claude Code Docs
Plugin Guide
Plugin Reference
Plugin Marketplaces

Community

Browse on GitHub
Get Support

Legal

Terms of Service
Privacy Policy

Browse · Plugins · Top Plugins · Marketplaces · Components · Technologies · Skills · Agents · Commands · Hooks · MCP Servers · LSP Servers · Output Styles · Themes · Monitors

Categories · Productivity · Development · Testing · Deployment · Security · Documentation · Data · Utilities

© 2025 ClaudePluginHub

Community Maintained · Not affiliated with Anthropic

ClaudePluginHub

ClaudePluginHub

Tools Learn Pricing

Search everything...

red-team-eval-authoring | security-guardrails

Home
Skills
security-guardrails
red-team-eval-authoring

Skill

red-team-eval-authoring

From security-guardrails

Guides authoring and reviewing red-team eval plugins, attack templates, grader rubrics, safety fixtures, and model-risk test metadata.

Popularity

Parent stars

5

Parent forks

1

Shared by

3

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/security-guardrails:red-team-eval-authoring

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

- Adding a new red-team plugin or grader.

Supporting Files

references/redteam-grader-checklist.md

SKILL.md

40 lines · ~501 tokens

Stats

LanguageTypeScript

Parent stars5

Parent forks1

MaintenanceExcellent

Last CommitMay 6, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

attack-templates

Help us improve

Share bugs, ideas, or general feedback.

Stats

LanguageTypeScript

Parent stars5

Parent forks1

MaintenanceExcellent

Last CommitMay 6, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

attack-templates

Help us improve

Share bugs, ideas, or general feedback.

Red-Team Eval Authoring

When To Use

Adding a new red-team plugin or grader.
Editing attack templates, rubric tags, or plugin metadata.
Reviewing multimodal or tool-use safety evals for false positives/negatives.

Requirements / Checks

Confirm target eval framework and repo layout before editing.
Prefer deterministic checks for template shape before model-graded rubrics.
Ask before running networked evals, paid model graders, or large red-team suites.

Workflow

Define target harm class, safe behavior, and explicit pass/fail boundary.
Standardize grader inputs before writing rubrics: user query, system purpose, model output, allowed entities.
Write attack prompt templates that emit one prompt per line or one machine-parseable record per case.
Keep rubric output structured: { reason, pass, score }.
Add registration metadata wherever the host framework expects plugin listing, risk category, aliases, and grader binding.
Add focused tests for template variables, grader parsing, and one benign over-refusal case.

Safety Constraints

Do not paste real secrets, private prompts, or customer data into attack templates.
Avoid storing base64 image payloads in text-only grader variables; use a text-only prompt field when available.
Do not broaden a plugin from one risk class to another without updating docs, metadata, and tests.
Do not run harmful prompt generation against production systems without explicit approval.

Validation / Done Criteria

Plugin metadata, generator, grader, and docs refer to the same risk category.
Rubric tags are consistent and not deprecated.
Benign and harmful fixtures both execute locally.
Results show reasoned pass/fail output, not only raw scores.

References

references/redteam-grader-checklist.md

$

npx claudepluginhub yeaight7/agent-powerups --plugin security-guardrails

Similar Skills

orch-change-feature

214.4k

Orchestrates changing an existing working feature to new desired behavior by updating tests first, then implementation, with review and gated commit.

View orch-change-feature