Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By mega-edo
Run automated red-team security audits on LLM prompts and agent pipelines, then harden them through iterative optimization loops that fix vulnerabilities, validate against regression, and produce audit-grade reports.
npx claudepluginhub mega-edo/mega-security --plugin mega-securityCurates the Red Team simulation suite for mega-security (the "attack" half of dual-axis evaluation) from public benchmarks, respecting activated threat-simulation tiers, product profile, and contamination weighting. Outputs JSONL probe sets with train/val split + manifest. The val split is distributionally distant (multilingual / rephrased / novel-entity) from train to enforce defense generalization and prevent cherry-picking.
Curates the Blue Team (legitimate-use / usability-regression) suite for mega-security — the second axis of dual-axis evaluation that catches Red Team hardening collapsing legitimate product flows. Prefers to reuse the user's existing data-eval (audited), falls back to domain-tied synthesis or general benchmarks. Outputs JSONL stratified by intent and edge-case proximity.
Audits judge verdict correctness on a sample of failed/refused traces from an iter-0 baseline before agent-optimize enters its main loop. Computes false-positive rate per axis (attack DSR, benign FRR) and gates loop entry.
Reviews code changes via git diff — verifies intent vs reality, runs syntactic and runtime checks, applies caller-supplied review criteria
Commit current changes on the MEGA branch with a context-aware message. Use after completing a MEGA pipeline step.
Internal report writer. Auto-invoked by agent-optimize at loop completion; not a user-facing entry point. Reads loop history from .mega_security/feedback/ and writes .mega_security/MEGA_SECURITY.md (final audit-grade hardening report) plus .mega_security/meta/security-learnings.md.
Lightweight security check for a single chat system prompt (no agent loop, no tools, no RAG). Runs 100 attack tests across 4 attack types (prompt injection, jailbreak, PII disclosure, system prompt leak) as the held-out scoring set, plus a parallel tuning set of 100 used only by the optimizer. The attacks ship with the tool as a vetted set — each one previously got past or barely held against a capable baseline AI, so meaningful differences between models actually surface. Plus 16+16 legitimate-use tests for over-blocking detection. Writes MEGA_PROMPT_CHECK.md with block rate per attack type, failure examples, and weakness analysis.
Matches all tools
Hooks run on every tool call, not just specific ones
Uses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Security controls for AI agents — deterministic policy enforcement, OWASP ASI10 scanning, and audit trails.
GoPlus AgentGuard — AI agent security guard. Blocks dangerous commands, prevents data leaks, protects secrets. 20 detection rules, runtime action evaluation, trust registry.
Skeptical-reading and prompt-injection defense for AI coding agents. Trust nothing. Ship safely.
Safety for Agents - Agent Detection & Response (ADR) for Claude Code
Automated OWASP security checks — Web Top 10:2025, LLM Top 10:2025, API Security Top 10:2023
APort Agent Guardrails — security policy enforcement for every tool call. Intercepts tool use, evaluates against your passport policy, and blocks unauthorized actions.
Share bugs, ideas, or general feedback.
The evaluation-driven approach to LLM system-prompt and agent security.
Define the attack surface, measure it, harden to pass — for chat prompts and full agent pipelines.
Quick Start · What it does · Agent Security · Benchmark · Leaderboard ↗ · megacode.ai ↗
[!WARNING] Routing through OpenClaw, Hermes, LiteLLM, or OpenRouter? Your system prompt runs on whichever model the router picks at request time and defense rates swing from 0.50 to 0.91 across vendors. Untuned, you ship the worst case.
[!IMPORTANT] Your system prompt is your trust asset. In production it has been breaking repeatedly: EchoLeak (zero-click M365 Copilot exfiltration), the Gap chatbot jailbreak, the Chevy "$1 Tahoe" persona override, and 7+ vendor system prompts now public on GitHub. A static prompt is no longer enough — and once tools, RAG, and memory enter the picture, the attack surface widens beyond what any single prompt can hold.
The common pain points teams hit shipping LLM products:
mega-security is an example of evaluation-driven development applied to LLM security. It ships four Claude Code commands that diagnose and harden chat system prompts and full agent pipelines, fail-closed, reproducible, and never modifying your code without your explicit approval.
Inside any Claude Code session:
/plugin marketplace add https://github.com/mega-edo/mega-security
/plugin install mega-security@mega-edo
That's it. Commands become available immediately:
Chat system prompts — single prompt.txt / system-message scope:
/prompt-check # 5–10 min diagnosis of a single system prompt
/prompt-optimize # iterative hardening with no-regression guarantees
Full agent pipelines — products with tools, RAG, memory, or multi-archetype orchestration:
/agent-check # static OWASP review + Red/Blue Team baseline (~10–20 min)
/agent-optimize # source-level hardening loop with Pareto acceptance gates
To pull updates later: /plugin upgrade mega-security.
[!TIP] Not sure which one you want? If your product has tools, a vector store, or rendered output, run
/agent-check. If it's a pure text-in/text-out chat with one system prompt,/prompt-checkis faster and ships the same defensive posture for that scope.
git clone https://github.com/mega-edo/mega-security ~/mega-agent-security
claude --plugin-dir ~/mega-agent-security
--plugin-dir is session-scoped and additive. To load multiple plugins in one session, repeat the flag. After editing plugin files mid-session, run /reload-plugins to refresh.