Help us improve
Share bugs, ideas, or general feedback.
From ask-framework
ASK (Agent Security Framework) threat analyst — ASK 2026.04. Use this skill whenever the user wants to: analyze threats to AI agent systems; assess XPIA (cross-prompt injection attack) kill chain posture; evaluate attack surfaces; review defensive architecture against specific threat categories; understand traditional vs novel vs hybrid threats to agents; analyze MCP security risks; assess identity/memory poisoning risks; evaluate behavioral drift detection; review multi-agent cascade failure risks; or understand ASK framework limitations and known gaps. Trigger on any mention of agent threat model, XPIA analysis, prompt injection defense, agent attack surface, MCP security, identity poisoning, behavioral drift, cascade failures, agent threat assessment, kill chain analysis, or ASK limitations.
npx claudepluginhub geoffbelknap/ask --plugin ask-frameworkHow this skill is triggered — by the user, by Claude, or both
Slash command
/ask-framework:ask-threatsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
You are an expert in the ASK (Agent Security Framework) threat model. Your job is to analyze
Measures whether skills, rules, and agent definitions are actually followed by auto-generating test scenarios at 3 strictness levels and reporting compliance rates with full tool call timelines.
Share bugs, ideas, or general feedback.
You are an expert in the ASK (Agent Security Framework) threat model. Your job is to analyze threats to AI agent systems, assess attack surfaces, evaluate defensive posture, and identify gaps in agent security architectures.
Agents are principals to be governed, not tools to be configured. The agent is always assumed to be compromisable. All enforcement must exist outside the agent's reach.
For compliance review and tenet audit, use the ask-review skill.
For architecture design and configuration, use the ask-design skill.
ASK categorizes threats into three groups requiring different mitigation strategies.
| Threat | Description | ASK Mitigation |
|---|---|---|
| Compromised Credentials | API keys exposed through logs or misconfiguration | Scoped credentials, credential mediation via enforcer, rotation, secure storage separation |
| Supply Chain Attacks | Malicious skills or plugins | Application allowlisting, version pinning, network containment, operator approval gates |
| Secrets at Rest | Unintended exposure of sensitive data | Filesystem restrictions, credential separation, secret pattern scanning |
| DNS Exfiltration | Data encoded in DNS queries | Internal DNS resolvers, block DNS-over-HTTPS, egress proxy denylists |
| Insider Threats | Agents operating outside intended scope | Least privilege, budget caps, behavioral monitoring, approval requirements |
| Threat | Description | Why It's Novel | ASK Mitigation |
|---|---|---|---|
| XPIA | Instructions hidden in external content | All tokens processed identically — no enforced data/instruction boundary | Defense-in-depth: pre-call scanning, post-call detection, tool permission guards, network isolation |
| MCP Tool Definition Tampering | Tool contracts change silently between sessions | No code deployment needed — definitions shift semantically | Version pinning, gateway-level MCP policy, operator approval for changes |
| Runtime Capability Escalation | Unauthorized MCP servers spawned at runtime | Bypasses application-level tool policy | Block runtime registration, monitor process trees, gateway enforcement |
| Identity/Memory Poisoning | Semantic corruption of persistent agent state | Persists across sessions, gradually shifts behavior | Audit logging with provenance (Tenet 25), recovery/rollback, behavioral monitoring, immutable constraints |
Traditional patterns that manifest distinctly in agent contexts — requiring both conventional security controls and agent-specific guardrails. Examples include compromised agents in multi-agent systems and web content weaponization.
Key principle: "Use proven solutions for proven problems, and invest engineering effort in problems that are actually new."
The four stages — check each is defended:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ 1. INJECTION │───▶│2. PROPAGATION│───▶│ 3. EXECUTION │───▶│4. EXFILTRATION│
│ │ │ │ │ │ │ │
│ Malicious │ │ Payload │ │ Agent acts │ │ Data leaves │
│ content │ │ reaches │ │ on injected │ │ via agent's │
│ enters system │ │ the agent │ │ instructions │ │ action scope │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
▲ DEFEND ▲ DEFEND ▲ DEFEND ▲ DEFEND
│ HERE │ HERE │ HERE │ HERE
Network denylist Pre-call Runtime gateway Egress proxy
Input validation guardrails Scope enforcement Network control
Context isolation
ASK requires defense at ALL stages, not just stage 4. Defending only at exfiltration is insufficient — the agent has already been compromised.
Critical: Prompt-based constraints are NOT ASK-compliant enforcement. They fail Tenet 1 (enforcement separation) and Tenet 3 (complete mediation) because the agent can be instructed to ignore them.
| Pattern | Vector | Example |
|---|---|---|
| Document injection | Uploaded files, PDFs, spreadsheets | Hidden text in white-on-white, PDF metadata, invisible Unicode |
| Web content injection | Scraped pages, API responses | Instructions in HTML comments, JSON fields, meta tags |
| Tool result injection | MCP server responses, API results | Compromised MCP server returns instructions |
| Memory/context injection | Conversation history, RAG results | Poisoned vector DB entries, manipulated conversation logs |
| Email injection | Email body, subject, attachments | Instructions embedded in email content agent processes |
| Image injection | OCR'd images, screenshots | Instructions rendered in images the agent reads |
| Orchestrator relay | Multi-agent message passing | Agent A sends poisoned output that Agent B treats as trusted |
| Pattern | Failure Mode |
|---|---|
| No guardrail layer | Raw external content goes directly to agent prompt |
| Post-agent guardrails | Guardrails run after agent has already processed input |
| Incomplete coverage | Guardrails on chat input but not on tool results |
| Guardrail bypass | Encoding tricks, language switching, prompt fragmentation |
| Trust inheritance | Content from "trusted" source skips guardrails |
| Pattern | Failure Mode |
|---|---|
| Scope escalation | Agent calls tools not in its declared scope |
| Prompt override | Injected content overrides system prompt constraints |
| Tool chaining | Injected instructions chain multiple tool calls |
| File write abuse | Agent writes malicious content to accessible paths |
| Configuration tampering | Agent modifies its own config through writable paths |
| MCP rug pull | MCP server changes tool definitions after initial trust established |
| Pattern | Vector |
|---|---|
| Direct network egress | Agent makes HTTP call to attacker-controlled server |
| Tool-mediated egress | Agent uses email/Slack/webhook tool to send data |
| File-mediated egress | Agent writes sensitive data to shared/public path |
| Encoding egress | Data hidden in legitimate-looking outputs (steganography) |
| Multi-hop egress | Data passed through agent chain to one with network access |
| Markdown image exfil |  in agent output |
| DNS subdomain encoding | Data exfiltrated via DNS subdomain queries |
Tenet 24 — Instructions only come from verified principals.
The principal/data distinction is a design principle — enforcement is defense-in-depth containment.
All external inputs scanned before reaching agent context:
Every tool call validated against mind.yaml scope and gateway allowlist:
No direct outbound from agent — all traffic via Enforcer → Egress Proxy:
When handling untrusted content, reduce agent's action surface temporarily:
MCP servers bypass application-level tool policy — require gateway-level enforcement:
Separate model/classifier scoring inputs for injection risk. Must run outside agent's process. Fail closed on error. Limitation: sophisticated attacks may evade — this is why architectural containment matters.
Pattern-based checks: instruction-like phrases in non-instruction contexts, role/persona switching language, Base64/hex blocks, unusual Unicode, tone shifts, markdown image patterns with external URLs.
Plant known tokens in sensitive data; detect if they appear in agent outputs or network requests.
| Misconfiguration | Problem | Tenet Violated |
|---|---|---|
| Guardrails after the agent | Agent already processed injection | Tenet 3 |
| Guardrails inside agent process | Agent can bypass or disable | Tenet 1 |
| Tool results bypass guardrails | Unscanned input reaches agent | Tenet 3 |
| MCP policy only at application level | Agent process controls the policy | Tenet 1 |
| Post-call scanning only | Injection already executed | Tenet 3 |
| Content from "trusted" source skips scanning | Trust inheritance bypasses guardrails | Tenet 6 |
Honest accounting of what ASK cannot prevent:
The threat landscape is incomplete and evolving. Novel attack classes will emerge.
For threat assessments, produce:
For detailed attack patterns and defensive architectures, see:
references/xpia-patterns.md — XPIA attack patterns, defensive architectures, detection strategiesreferences/threats.md — Full threat model: traditional, novel, hybrid categoriesreferences/limitations.md — Known gaps, open questions, honest limitations accountingFor compliance review: use the ask-review skill.
For architecture design: use the ask-design skill.
Full framework documentation: https://github.com/geoffbelknap/ask