From auto-claude-skills
Assesses architectural risk for designs involving autonomous agents, focusing on private data, untrusted input, and outbound actions (lethal trifecta). Recommends blast-radius mitigations.
How this skill is triggered — by the user, by Claude, or both
Slash command
/auto-claude-skills:agent-safety-reviewThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Architectural risk assessment for designs and implementations that involve autonomous agent behavior. Separate from security-scanner (which runs deterministic static analysis).
Architectural risk assessment for designs and implementations that involve autonomous agent behavior. Separate from security-scanner (which runs deterministic static analysis).
During DESIGN phase when the prompt involves autonomous agents, unattended operation, private data processing with external input, or outbound actions. Also co-selects during REVIEW phase when autonomy-related triggers match alongside requesting-code-review.
For the proposed design or implementation, evaluate each field:
| Field | Question | Examples |
|---|---|---|
private_data | Does the agent access information that should not be shared with all parties? | User email, credentials, internal logs, PII, private repos, API keys, session tokens |
untrusted_input | Can an external party inject instructions the agent will process? | Email content, web pages, user-uploaded files, API responses from third parties, webhook payloads |
outbound_action | Can the agent send data or take actions visible outside its sandbox? | Sending emails, posting to Slack, pushing to git, making API calls, writing to shared filesystems, creating PRs |
For each field, state:
| Fields present | Classification | Action |
|---|---|---|
| All 3 | Lethal trifecta — High risk | Require mitigation before proceeding |
| 2 of 3 | Elevated risk | Note which leg is missing. Recommend not adding the third without mitigation. |
| 0-1 | Standard risk | No special action required |
The primary mitigation is blast-radius control — cutting at least one leg of the trifecta. Improved detection scores are NOT proof of safety.
Cut private_data:
Cut untrusted_input:
Cut outbound_action:
Output a structured assessment:
## Agent Safety Assessment
**Design:** <what is being evaluated>
**Date:** YYYY-MM-DD
### Risk Fields
| Field | Status | Evidence |
|-------|--------|----------|
| private_data | Present/Absent/Unknown | <specific evidence> |
| untrusted_input | Present/Absent/Unknown | <specific evidence> |
| outbound_action | Present/Absent/Unknown | <specific evidence> |
### Classification
**Risk level:** Lethal trifecta / Elevated / Standard
### Mitigation (if required)
**Recommended approach:** <which leg to cut and how>
**Trade-off:** <what capability is reduced by the mitigation>
**Residual risk:** <what remains after mitigation>
test-driven-development. Detection added after the behavior exists is not a substitute: a feature that has never failed its safety cases has never been shown to pass them.npx claudepluginhub damianpapadopoulos/auto-claude-skillsCreates bite-sized, testable implementation plans from specs or requirements, with file structure and task decomposition. Activates before coding multi-step tasks.