Skill

excessive-agency

Flags vulnerable patterns in autonomous LLM agents enabling irreversible actions without oversight. Suggests fixes like impact classification, tool allowlists, pre-dispatch auditing, and structured parameters for safe workflows.

security

ai-ml

npx claudepluginhub thejefflarson/soundcheck --plugin soundcheck

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Prevents autonomous agents from taking irreversible or high-impact actions without

SKILL.md

Similar Skills

autonomous-agents

36.4k

Guides building reliable autonomous AI agents with ReAct, Plan-Execute loops, reflection patterns, goal decomposition, and frameworks like LangGraph, CrewAI.

antigravity-awesome-skills

Autonomous Agents

Architects reliable autonomous AI agents with ReAct, Plan-Execute loops, goal decomposition, reflection patterns, guardrails, checkpointing, and production reliability principles.

3 files

omer-metin-skills-for-antigravity-2

agent-governance

29.8k

Implements governance patterns for AI agents: policy-based tool controls, intent classification, trust scoring, audit trails, rate limits. For LangChain, CrewAI, OpenAI Agents.

awesome-copilot-refactor

Stats

Stars13

Forks0

Last CommitApr 18, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Excessive Agency (OWASP LLM08:2025)

What this checks

Prevents autonomous agents from taking irreversible or high-impact actions without human oversight. When an LLM can directly write files, send emails, or modify databases, a single compromised or hallucinated step can cause unrecoverable damage.

Vulnerable patterns

Agent calls send_email() or delete_record() immediately on LLM instruction with no confirmation
Single LLM response authorizes an irreversible production action (deploy, drop table)
Agent runs with write access to all resources when only read is needed for the task
No kill switch, pause mechanism, or audit trail for agent actions

Fix immediately

Flag the vulnerable code and explain the risk. Then suggest a fix that establishes these properties:

Actions are classified by impact, and the classifier gates dispatch. Low-impact (reversible, narrow scope) may proceed; high-impact (irreversible, broad scope, external side effects) blocks on human approval. A classifier that's defined but never branched on is the bug this skill prevents.
Tool boundaries enforce an explicit allowlist of actions and resources — path prefixes, API endpoints, table names. The LLM does not choose what's allowed; the tool handler does, and rejects anything outside the list before dispatch.
Every executed action is audit-logged before dispatch, with enough context to reconstruct what happened: the action name, its parameters, the prompt that produced it, and the operator who approved it (if any). After-the-fact logging is insufficient — if dispatch crashes, the log is gone.
Irreversible actions cannot be invoked transitively through LLM-generated parameters. A tool named run_sql that accepts arbitrary queries violates this; a tool named archive_record(id) that only issues a scoped update does not.
When the task seems to require LLM-generated SQL, shell commands, or arbitrary code strings, redesign the tool interface. Expose typed parameters (table name, filter fields, numeric limits, path components) and reject raw strings at the handler boundary. A regex/denylist over a raw query string is bypassable through encoding, Unicode, or patterns the author didn't anticipate — it is not a substitute for a structured parameter schema.

Anchor — shape, not implementation:

action = plan_from_llm(task)
require(action.name in TOOL_ALLOWLIST)
if impact(action) == HIGH:
    require(human_approves(action))        # blocks until approved
audit_log(action)                          # before dispatch, not after
dispatch(action)

Verification

Confirm these properties hold regardless of language or framework:

Every destructive action taken by the agent is gated by an explicit human confirmation step
Agent tools enforce a path allowlist or action allowlist at the tool boundary
Agent loops have a bounded iteration count or a dry-run preview mode
Irreversible actions cannot be invoked transitively via LLM-generated parameters alone

References

CWE-272 (Least Privilege Violation)
CWE-250 (Execution with Unnecessary Privileges)
OWASP LLM08:2025 Excessive Agency