From ai-ide-vuln-skills
Plans and constructs multi-stage attack chains against AI IDEs by combining primitives like prompt injection and file writes. Classifies by interaction tier to assess security posture and prioritize reports.
npx claudepluginhub mindgard/ai-ide-skills --plugin ai-ide-vuln-skillsThis skill uses the workspace's default tool permissions.
Individual vulnerability primitives -- prompt injection, config poisoning, filter bypass, exfil channels -- combine into multi-stage attack chains. A PI alone may be low severity; PI plus file write plus config auto-reload is critical. This skill helps construct these chains from confirmed primitives, classify them by interaction tier, and assess their real-world severity.
Tests prompt injection chains in AI IDEs for config modification and privilege escalation vulnerabilities. Use for assessing adversarial attacks, rules override, auto-loading, and file-write exploits.
Scans agentic configs (.github/, .vscode/) using AgentShield's 33-rule taxonomy and application source code for OWASP Top 10 + STRIDE threats.
Performs security audits, hardening, threat modeling (STRIDE/PASTA), OWASP checks, code reviews, incident response, and infrastructure security for code, APIs, infra, and AI agents.
Share bugs, ideas, or general feedback.
Individual vulnerability primitives -- prompt injection, config poisoning, filter bypass, exfil channels -- combine into multi-stage attack chains. A PI alone may be low severity; PI plus file write plus config auto-reload is critical. This skill helps construct these chains from confirmed primitives, classify them by interaction tier, and assess their real-world severity.
Run this skill after testing with pattern-specific skills has identified individual primitives. It is the final step before writing up findings.
Each chain maps to a tier based on the user interaction required to trigger it. Test in priority order -- Tier 1 first.
| Tier | Label | User Interaction | Reportability |
|---|---|---|---|
| Tier 1 | Zero-Interaction | None. Clone and open. | Highest -- vendors cannot argue "user chose to trust." |
| Tier 2 | Agent-Mediated | User sends a message or asks a question. No explicit approval of the malicious action. | Strong -- normal developer workflow triggers the chain. |
| Tier 3 | Approval-Gated | User must click "Trust," "Allow," or approve a specific action. | Weak unless approval is misleading or social engineering is trivial. |
| Tier 4 | Trusted + Specific Action | User already trusted the project and takes a specific routine action. | Weakest standalone. Interesting with TOCTOU, scope escape, or guaranteed actions (git pull). |
Before constructing chains, confirm these primitives using the pattern-specific skills listed:
| Primitive | Confirm With | Required For Chains |
|---|---|---|
| PI susceptibility | prompt-injection-chains | Classic, Exfil Express, Persistence Play |
| File write (no approval) | prompt-injection-chains | Classic (Tier 2), Persistence Play |
| File write (approval-gated) | prompt-injection-chains | Classic (Tier 3) |
| MCP config auto-load (no approval) | mcp-config-poisoning | Zero-Click Config |
| MCP config auto-load (approval-gated) | mcp-config-poisoning | Classic via MCP pivot |
| Terminal command execution | terminal-filter-bypass | Chain escalation to RCE |
| Outbound channel (image, URL, DNS) | ai-ide-data-exfil | Exfil Express |
| Rules file auto-load | prompt-injection-chains | Persistence Play |
| Hooks/settings auto-execution | ai-ide-code-exec | Zero-Click Config, Classic escalation |
| TOCTOU in approval model | ai-ide-code-exec, mcp-config-poisoning | Persistence Play (Tier 4), The Long Con |
| One-time config approval (path/name-keyed) | mcp-config-poisoning, ai-ide-code-exec | The Long Con |
Do not attempt chain construction until at least two primitives are confirmed. A single primitive is a finding, not a chain.
Malicious Config Present --> IDE Opens Workspace --> Code Execution
Entry: Malicious config file already in the repository. No PI needed.
Impact: Code execution on workspace open.
Requires: Config auto-loading without approval.
Tier rationale: No user interaction beyond cloning and opening. The user has not granted trust, sent a message, or clicked any approval. This is the highest-severity class.
Real-world examples:
Cloned Repo --> PI in README --> Data Exfiltration
Entry: PI in visible workspace content.
No file write needed -- PI directly triggers exfil through an outbound channel.
Impact: Sensitive data (API keys, env vars, source code) sent to attacker.
Requires: PI susceptibility + outbound channel (image rendering, URL fetch, DNS).
Tier rationale: Requires the user to send a message or interact with the agent (triggering PI processing), but no explicit approval of the exfil action. Normal developer actions ("explain this code," "summarize the README") complete the chain.
Real-world examples:
Cloned Repo --> Hidden PI --> File Write --> Config Modification --> Code Execution
Entry: Malicious content in a cloned repository (README, code comments, hidden text).
Pivot: PI causes the agent to write a file. This is the critical gate.
Escalation: The written file modifies IDE config:
.vscode/settings.json --> executable path override --> RCEImpact: Arbitrary code execution with the user's permissions.
Requires: PI susceptibility + file write + config auto-reload.
Tier classification:
Real-world examples:
PI --> File Write --> Rules Override --> Persistent Backdoor
Entry: PI from any source.
Pivot: File write to a rules/instruction file.
Persistence: Modified rules survive sessions, infect future interactions.
Impact: Long-term backdoor in IDE behavior. Every future agent interaction in the workspace follows attacker-controlled instructions.
Requires: PI susceptibility + file write + rules auto-loading without re-approval.
Tier rationale: Typically requires a trusted workspace context and a specific triggering action. Escalates to a serious finding when combined with TOCTOU -- trust was granted to a benign config that is later modified via git commit, and the IDE does not re-prompt for approval. Without TOCTOU or scope escape, this is Tier 4 (weakest standalone).
Real-world examples:
Benign Config Approved --> Attacker Modifies via Git Commit --> Victim Does Git Pull --> Modified Config Loads Silently --> Code Execution
Entry: Benign config in a shared repository. The config passes review and receives one-time approval from the victim.
Pivot: Trust persists across config modifications. The IDE's approval model is keyed to the config's path or name, not its content hash. Once approved, the config is trusted indefinitely regardless of subsequent changes.
Escalation: The attacker modifies the approved config in a later git commit -- adding a malicious MCP server, changing a hook command, or injecting a rules override. The victim pulls the change as part of normal workflow. The IDE loads the modified config without re-prompting for approval.
Impact: Arbitrary code execution via a config the victim previously reviewed and approved.
Requires: One-time approval of the initial benign config + no re-approval on modification (approval keyed by path/name rather than content hash) + shared repository where the attacker can push commits.
Tier rationale: This is Tier 4 because it requires the victim to have already trusted the workspace and to perform a specific action (git pull). However, git pull is so routine and guaranteed in collaborative workflows that the "specific action" requirement is effectively automatic. The time delay between approval and exploitation makes this particularly insidious -- the victim's trust decision was correct at the time it was made.
Real-world examples:
Decision matrix: given the confirmed primitives, which chains are viable at which tier?
| Confirmed Primitives | Zero-Click Config | Exfil Express | The Classic | Persistence Play | The Long Con |
|---|---|---|---|---|---|
| Config auto-load (no approval) | Tier 1 -- GO | -- | -- | -- | -- |
| PI + outbound channel (no approval on exfil) | -- | Tier 2 -- GO | -- | -- | -- |
| PI + file write (no approval) + config auto-reload | -- | -- | Tier 2 -- GO | Tier 2-3 if rules auto-load | -- |
| PI + file write (approval-gated) + config auto-reload | -- | -- | Tier 3 -- only if approval UI is weak | Tier 3-4 | -- |
| PI + file write + rules auto-load + TOCTOU | -- | -- | -- | Tier 4 -- GO (TOCTOU elevates) | -- |
| One-time config approval + no re-approval on modification + shared repo | -- | -- | -- | -- | Tier 4 -- GO |
| PI only (no file write, no outbound channel) | -- | -- | Blocked | Blocked | -- |
| No PI, no auto-load config | -- | -- | Blocked | Blocked | Blocked |
| Outbound channel requires user-triggered action with approval | -- | Tier 3 at best -- weak | -- | -- | -- |
Reading the matrix: "GO" means the chain is viable and reportable at that tier. Dashes mean the chain is not applicable. Blocked means missing primitives prevent chain construction. When a cell shows a tier without "GO," the finding is marginal -- assess social engineering difficulty before reporting.
These counter-examples help testers avoid wasting time on findings vendors will reject.
Classic Chain in a trusted workspace without TOCTOU. If the user explicitly trusted the workspace, PI leading to file write and config modification is operating within the trust boundary the user accepted. Without TOCTOU (approval was one-time but the config is later modified via git without re-approval), this is expected behavior in a trusted context. The user chose to trust; the system honored that trust.
MCP config auto-load after explicit informed approval. If the IDE shows the user the full MCP server command before execution, requires explicit approval, and re-prompts on any change to the config, the approval gate is functioning correctly. A malicious MCP config in a repo that the user reviews and approves is not a vulnerability -- it is the security model working as designed.
Any chain where every step requires explicit informed user approval. If the user must approve the PI-triggered file write (with a clear diff showing the malicious content), then separately approve the config change, then separately approve execution -- no security boundary has been bypassed. The chain exists but the gates held. This is Tier 3 at best and only reportable if the approval UI actively misleads.
Exfil Express where the outbound channel requires user-triggered action with approval. If exfiltrating data requires the user to explicitly trigger a fetch, render, or network action and approve it with knowledge of the destination, the channel is gated. A PI that prepares exfil content but cannot send it without informed user action is not a complete chain.
General principle: A vulnerability exists when a security boundary is absent, bypassable, or misleading. When the boundary is present, correctly implemented, and the user makes an informed decision, the remaining risk is social engineering -- which is a user education problem, not a product vulnerability.
List confirmed vulnerabilities from pattern-specific skills:
| Primitive | Confirmed? | Skill Used | Notes |
|---|---|---|---|
| PI susceptibility | prompt-injection-chains | Via what delivery? | |
| File write (no approval) | prompt-injection-chains | To what paths? | |
| File write (with approval) | prompt-injection-chains | How easy to social-engineer? | |
| MCP config auto-load | mcp-config-poisoning | Zero-click or approval-gated? | |
| Terminal command execution | terminal-filter-bypass | Any filter? Bypassed? | |
| Image rendering (outbound) | ai-ide-data-exfil | Blocked or unblocked? | |
| Rules file auto-load | prompt-injection-chains | Which rules files? | |
| Hooks/settings execution | ai-ide-code-exec | Zero-click? | |
| TOCTOU in approval model | mcp-config-poisoning, ai-ide-code-exec | Approval keyed by path or content hash? |
Which primitive provides the PI --> file write transition? This is the critical link that separates low-severity PI from critical RCE chains.
From file write, what configs can be modified?
File Write
+-- .vscode/settings.json --> php.validate.executablePath --> RCE
+-- MCP config file --> malicious server --> RCE
+-- Rules file --> persistent behavior change --> eventual RCE
+-- Hooks config --> lifecycle hook execution --> RCE
+-- .code-workspace --> trust boundary expansion --> RCE
+-- .gitconfig --> git external diff --> RCE
+-- URL-fetching config --> data exfiltration
How does the chain activate?
| Model | Description | Attacker Effort | User Interaction | Tier |
|---|---|---|---|---|
| Zero-click | Triggers on workspace open | Config file in repo | None | Tier 1 |
| One-click | Requires single user action | PI + trigger action | "Explain this code," "Review the README" | Tier 2 |
| Autorun | Triggers in agent/autonomous mode | PI in workspace | User enables agent mode | Tier 2 |
| Approval-gated | Requires explicit user approval | PI + social engineering | User clicks "Allow" on prompted action | Tier 3 |
| Time-delayed | Triggers via future commit (TOCTOU) | Benign config initially, malicious in later commit | None after initial setup | Tier 4 |
Build the end-to-end exploit using templates from references/chain-templates.md:
| Tier | Trigger Model | Chain Length | Impact | Severity |
|---|---|---|---|---|
| Tier 1 | Zero-click | 1 step (config --> exec) | RCE | Critical |
| Tier 1 | Zero-click | 2+ steps | RCE | High-Critical |
| Tier 2 | One-click / Autorun | Any | RCE | High |
| Tier 2 | One-click | Any | Data exfil | Medium-High |
| Tier 3 | Approval-gated | Any | RCE | Medium (depends on approval UI quality) |
| Tier 4 | Time-delayed (TOCTOU) | Any | RCE | Medium-High (TOCTOU elevates) |
One of several security gates to assess. File-write status determines which PI-driven chains are viable, but other gates (workspace config approval, initialization safety, outbound controls) independently block other chain types. See the README for the full gate model.
Can the AI write files without approval?
Can the AI write to config files the IDE auto-loads?
Does the IDE re-approve on config change?
See references/trigger-model-matrix.md for the per-IDE matrix.