From cybersecurity
Assesses AI/LLM application security including prompt injection, jailbreak resistance, OWASP LLM Top 10 (2025), RAG/agent security, and model supply chain risks. Maps findings to MITRE ATLAS and recommends mitigations.
How this skill is triggered — by the user, by Claude, or both
Slash command
/cybersecurity:16-ai-llm-securityThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Enable Claude to assess the security of AI/LLM-powered applications — chatbots, RAG pipelines, autonomous agents, and tool-using systems. Claude maps findings to the **OWASP Top 10 for LLM Applications (2025)** and the **MITRE ATLAS** adversarial-ML knowledge base, builds reproducible attack cases, and recommends concrete mitigations (input/output guardrails, least-privilege tool scopes, conten...
Enable Claude to assess the security of AI/LLM-powered applications — chatbots, RAG pipelines, autonomous agents, and tool-using systems. Claude maps findings to the OWASP Top 10 for LLM Applications (2025) and the MITRE ATLAS adversarial-ML knowledge base, builds reproducible attack cases, and recommends concrete mitigations (input/output guardrails, least-privilege tool scopes, content provenance).
Authorization Required: Only test AI systems you own or are explicitly authorized to assess. Prompt-injection and data-exfiltration testing against third-party AI services may violate their terms of service and local law. Confirm written scope before proceeding.
This skill activates when the user asks about:
pickle, model registries)pip install requests pyyaml rich
Optional enhanced capabilities:
garak — LLM vulnerability scanner (NVIDIA)promptfoo — prompt/red-team evaluation harnessmodelscan / picklescan — ML model file safety scanningWhen asked to threat-model an AI application, map the system against each category and record exposure:
| ID | Risk | What to look for |
|---|---|---|
| LLM01 | Prompt Injection | Untrusted text reaching the prompt (direct & indirect via RAG/web/email) |
| LLM02 | Sensitive Information Disclosure | PII/secrets in prompts, outputs, or training data; system-prompt leakage |
| LLM03 | Supply Chain | Untrusted models, LoRA adapters, datasets, plugins, pickle deserialization |
| LLM04 | Data & Model Poisoning | Tainted training/fine-tune/RAG data; backdoors |
| LLM05 | Improper Output Handling | LLM output passed unsanitized to SQL, shell, browser (XSS), or eval |
| LLM06 | Excessive Agency | Over-broad tool scopes, autonomous side effects, no human-in-the-loop |
| LLM07 | System Prompt Leakage | Secrets/authz logic embedded in the system prompt |
| LLM08 | Vector & Embedding Weaknesses | RAG access-control bypass, embedding inversion, cross-tenant leakage |
| LLM09 | Misinformation | Hallucinations relied on for security/safety decisions |
| LLM10 | Unbounded Consumption | Cost/DoS via token floods, model extraction, wallet-drain |
Produce a per-category table: Exposure (Yes/No/Partial) → Evidence → Severity → Mitigation.
Direct injection — user input that overrides instructions. Test families:
Indirect injection — payload arrives via retrieved/processed content (web page, PDF, email, RAG doc, tool output). This is the highest-impact class for agents. Test that retrieved text cannot issue commands, exfiltrate context, or trigger tools.
For every test record: payload, channel (direct/indirect), goal (override / exfiltrate / tool-abuse), and result (blocked / partial / success). Use scripts/prompt_injection_tester.py to run a corpus and score outcomes.
Refusal-quality note: a single refusal is not a pass. Re-test the same goal across ≥3 phrasings and obfuscations before marking a control effective.
When reviewing a RAG pipeline:
The agent is a confused deputy: it holds privileges the user may not. Review:
execute_shell/http_request to arbitrary hostspickle/.pt/.bin can execute code on load. Prefer safetensors. Run scripts/model_supply_chain.py or modelscan.eval, SQL, shell, or innerHTML. Encode/parameterize at the sink (LLM05).Produce a structured AI security assessment:
# AI/LLM Security Assessment — [Application]
Date: [Date] | Scope: [Endpoints/Models] | Model: [name/version] | Analyst: [Name]
## Executive Summary
[2-3 sentences: overall posture, highest risks]
## OWASP LLM Top 10 Coverage
| ID | Risk | Exposure | Severity | Evidence |
|----|------|----------|----------|----------|
| LLM01 | Prompt Injection | Yes | High | [repro] |
...
## Confirmed Findings
### [F-01] Indirect Prompt Injection via RAG → Tool Abuse (Critical)
- ATLAS: AML.T0051 / OWASP LLM01+LLM06
- Repro: [payload, channel, steps]
- Impact: [data exfil / unauthorized action]
- Mitigation: [least-privilege tool scope + retrieved-content isolation + HITL]
## Guardrail Bypass Matrix
| Goal | Direct | Encoded | Multi-turn | Indirect | Result |
## Recommendations (Prioritized)
1. ...
prompt_injection_tester.py# Run the built-in injection/jailbreak corpus against an endpoint
python scripts/prompt_injection_tester.py --url https://app.test/api/chat --field message --output results.json
# Use a custom payload corpus and a refusal-detection keyword set
python scripts/prompt_injection_tester.py --url ... --corpus payloads.txt --judge-keywords refusals.txt
model_supply_chain.py# Scan a model directory/file for unsafe pickle opcodes and risky imports
python scripts/model_supply_chain.py --path ./models/model.pt
python scripts/model_supply_chain.py --path ./models/ --recursive --output scan.json
| Next Step | Condition | Target Skill |
|---|---|---|
| Web/API vuln testing of the app shell | App exposes web/API surface | → Skill 09 |
| Cloud/infra hosting the model | Model served on AWS/Azure/GCP/K8s | → Skill 10 |
| Detection rules for prompt-injection attempts | Need SIEM coverage | → Skill 12 |
| Dependency/model-package CVEs | ML libs in use | → Skill 02 |
| Red team narrative incorporating AI abuse | Full engagement | → Skill 14 |
npx claudepluginhub masriyan/claude-code-cybersecurity-skill --plugin cybersecurityAudit applications for AI prompt injection, agent security, and LLM permission boundary vulnerabilities. Use when securing AI features or agents.
Reviews AI/LLM applications for security risks including prompt injection, RAG security, agent permissioning, jailbreaks, data leakage, and model supply chain threats.
Tests LLM applications for OWASP Top 10 vulnerabilities using 10 specialized agents. Integrates with pentest workflows for comprehensive AI security assessments.