Help us improve
Share bugs, ideas, or general feedback.
From cybersecurity-skills
Audit applications for AI prompt injection, agent security, and LLM permission boundary vulnerabilities. Use when securing AI features or agents.
npx claudepluginhub briiirussell/cybersecurity-skills --plugin cybersecurity-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/cybersecurity-skills:prompt-injectionThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Audit applications that use AI features, LLM integrations, or AI agents for prompt injection, privilege escalation, and authorization bypass vulnerabilities.
Tests LLM applications for OWASP Top 10 vulnerabilities using 10 specialized agents. Integrates with pentest workflows for comprehensive AI security assessments.
Mitigate prompt injection risks in LLM-based systems. Use when designing, building, or reviewing AI systems that accept user prompts, or when evaluating model safety for deployment.
Runs 100 attack tests for prompt injection, jailbreak, PII disclosure, and system prompt leak to evaluate a chat system prompt's security. Writes a report with block rates and weakness analysis.
Share bugs, ideas, or general feedback.
Audit applications that use AI features, LLM integrations, or AI agents for prompt injection, privilege escalation, and authorization bypass vulnerabilities.
Cross-references: threat-modeling for design-time AI risk modeling on new AI features (before this skill applies); owasp-audit for the XSS / output-rendering patterns that overlap when LLM output reaches the browser (sanitize on render, JSON-LD breakout); api-audit for the API surface that LLM tools and MCP servers expose; ai-risk-management for the broader governance frame this skill sits within — prompt injection is the security slice of AI risk; AI RMF covers the rest (fairness, robustness, transparency, drift, lifecycle).
Prompt injection is the #1 vulnerability in LLM-integrated applications (OWASP Top 10 for LLMs, LLM01). It occurs when untrusted input influences the instructions an LLM follows, causing it to ignore its system prompt, leak secrets, or take unauthorized actions.
Three attack classes:
Identify every place the application uses AI. This includes direct LLM API calls AND higher-level AI features:
Grep for LLM API calls:
- openai, anthropic, cohere, replicate, ollama
- ChatCompletion, messages.create, generate, complete
- langchain, llamaindex, autogen, crewai
Also look for AI features that may not be obvious LLM calls:
- AI-powered search or recommendations
- AI content generation (summaries, descriptions, emails)
- AI chatbots or copilots embedded in the app
- AI-assisted form completion or auto-fill
- AI moderation or classification
- AI-driven workflow automation
- MCP (Model Context Protocol) servers and tool registrations
For each AI integration, document:
Check how prompts are assembled. Look for:
Unsanitized interpolation:
# VULNERABLE — user input directly in prompt
prompt = f"Summarize this: {user_input}"
# VULNERABLE — external data injected without marking
prompt = f"Answer based on this context: {rag_results}"
Missing input/output boundaries:
# BETTER — clear delimiters separating instructions from data
prompt = f"""Summarize the text between the <document> tags.
<document>
{user_input}
</document>"""
Secrets in system prompts:
# VULNERABLE — API keys, database credentials, or internal URLs in system prompt
system = f"You are a helper. Use API key {API_KEY} to call..."
Check for these patterns:
Check what happens with LLM responses:
Rendered as HTML (XSS via LLM):
// VULNERABLE — LLM output rendered as raw HTML
<div dangerouslySetInnerHTML={{ __html: llmResponse }} />
If the LLM can be tricked into outputting <script> tags or event handlers, and the output is rendered unsanitized, this is XSS.
Executed as code:
# VULNERABLE — LLM output passed to eval/exec
exec(llm_response)
Used in database queries:
# VULNERABLE — LLM output used in raw SQL
cursor.execute(f"SELECT * FROM {llm_response}")
Passed to another LLM (chained injection): If LLM A's output becomes input to LLM B, an attacker can inject instructions that propagate through the chain.
If the LLM has access to tools, function calls, or operates as an autonomous agent:
Tool inventory and validation:
# VULNERABLE — LLM can call any tool without validation
result = execute_tool(tool_name=llm_choice, args=llm_args)
# BETTER — allowlist + argument validation + confirmation for destructive actions
if tool_name not in ALLOWED_TOOLS:
raise ValueError("Tool not permitted")
validated_args = validate_tool_args(tool_name, llm_args)
if tool_name in DESTRUCTIVE_TOOLS:
require_user_confirmation(tool_name, validated_args)
AI agent-specific risks:
Check for autonomous agent patterns (agent loops, multi-agent orchestration, agent frameworks):
Test whether the system prompt can be extracted:
Common extraction attempts:
Check if the application:
This is critical for apps with role-based access, multi-tenant data, or tiered permissions.
Confused deputy — does the AI inherit the right permissions?
# VULNERABLE — AI queries database with admin-level service account
results = db.query(ai_generated_sql) # Bypasses row-level security
# BETTER — AI queries execute under the requesting user's permissions
results = db.query(ai_generated_sql, user_context=request.user)
Privilege escalation through AI:
Multi-tenant data leakage:
Cross-privilege injection:
Permission check checklist for AI features:
| Check | Status | Notes |
|---|---|---|
| AI tool calls go through the same auth middleware as user actions | ||
| AI database queries are scoped to the requesting user's permissions | ||
| RAG retrieval is filtered by tenant/user access level | ||
| AI cannot access admin APIs on behalf of non-admin users | ||
| Shared data consumed by AI is treated as untrusted input | ||
| AI feature access itself is gated by user role where appropriate |
Check what defenses are in place and whether they're sufficient:
| Defense | Present? | Notes |
|---|---|---|
| Input validation/sanitization | Strip or escape control characters, limit length | |
| Prompt delimiters | Clear boundaries between instructions and data | |
| Output validation | Check LLM output before rendering/executing/storing | |
| Tool call validation | Allowlist tools, validate arguments, gate destructive actions | |
| Privilege separation | LLM operates with minimum necessary permissions | |
| User-scoped AI queries | AI data access filtered by requesting user's role/tenant | |
| Agent loop limits | Max iterations, token budgets, timeouts for autonomous agents | |
| Agent memory isolation | Untrusted data cannot poison agent memory/state | |
| MCP server auth | MCP tools authenticated and scoped per user | |
| Rate limiting | Prevent automated injection attempts | |
| Monitoring/logging | Log prompts, completions, and tool calls for anomaly detection | |
| Human-in-the-loop | Require approval for high-risk actions |
# Prompt Injection Audit Report
## Application: [name]
## Date: [date]
### LLM Integration Map
| Integration | Model | User Input? | External Data? | Tools? | Output Usage |
|-------------|-------|-------------|----------------|--------|-------------|
### Findings
#### [SEVERITY] [Title]
**File:** `path/to/file:line`
**Category:** Direct Injection / Indirect Injection / Cross-Privilege Injection / Prompt Leaking / Insecure Output / Tool Abuse / Agent Security / Permission Bypass
**Description:** [What the vulnerability is]
**Attack scenario:** [How an attacker could exploit this]
**Vulnerable code:**
[code snippet]
**Remediation:**
[Fixed code with explanation]
---
### Defense Assessment
| Defense Layer | Status | Recommendation |
|--------------|--------|----------------|
### Prioritized Remediation
1. [Critical — permission bypass, privilege escalation, or multi-tenant data leakage through AI]
2. [Critical — exploitable injection paths with tool/agent access]
3. [High — unsanitized user input in prompts, agent memory poisoning]
4. [Medium — missing output validation, unbounded agent loops]
5. [Low — defense-in-depth improvements, monitoring gaps]