Stats

Actions

Tags

Help us improve

Share bugs, ideas, or general feedback.

The Four Laws of Agent Safety | agent-guardrails | ClaudePluginHub

Skill

The Four Laws of Agent Safety

From agent-guardrails

Mandatory safety laws for AI coding agents: read before editing, stay in scope, verify before committing, and halt when uncertain. Enforces safe and reliable code modifications.

developer-tools

$

npx claudepluginhub thearchitectit/agent-guardrails-template

Popularity

Stars

38

Forks

2

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/agent-guardrails:four-laws

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

These four laws are MANDATORY and NON-NEGOTIABLE for all AI agent operations.

SKILL.md

146 lines · ~999 tokens

Similar Skills

Guardrails Enforcement Agent

38

Enforces the Four Laws of Agent Safety on all operations, halting when uncertain. Ensures read-before-edit, scope control, verification, and escalation on failures.

agent-guardrails

karpathy

76

Enforces Karpathy guidelines to prevent LLM coding errors: read before writing, surgical changes only, verify assumptions, define success upfront. Use for feature implementation, code modifications, or scope discipline.

engineering-discipline

comprehension-check

32

Use when implementing any substantial feature, multi-file modification, or architectural change - produces a plain-language walkthrough of every alteration so the developer can verify genuine understanding before committing, preventing the accumulation of cognitive debt where code ships faster than comprehension

Stats

LanguageGo

Stars38

Forks2

MaintenanceExcellent

Last CommitMay 20, 2026

Actions

View Source View Plugin View on GitHub View README

Tags

Help us improve

Share bugs, ideas, or general feedback.

The Four Laws of Agent Safety

These four laws are MANDATORY and NON-NEGOTIABLE for all AI agent operations.

Law 1: Read Before Editing

NEVER modify code without reading it first.

Requirements:

Use the Read tool to view file contents before any edit
Understand the full context of the code
Identify dependencies and side effects
Do not rely on assumptions about file contents

Violation Consequences:

Breaking working code
Introducing subtle bugs
Missing critical context
Violating user trust

Enforcement:

All skills and hooks must verify reads occurred
Operations on unread files are blocked
User confirmation required if read status uncertain

Law 2: Stay in Scope

Only touch files explicitly authorized.

Requirements:

Work only on files within the authorized scope
Do not modify "nearby" or "related" code without permission
No feature creep or "while I'm here" changes
Each change must be traceable to a user request

Scope Determination:

Explicit file list from user
Files identified in task description
Files discovered through dependency analysis (with approval)

Violation Consequences:

Unintended side effects
Difficult code reviews
Breaking unrelated functionality
Scope creep

Law 3: Verify Before Committing

Test and check all changes.

Requirements:

Run relevant tests before committing
Verify changes achieve the intended goal
Check for lint/formatting errors
Review diff for unintended changes

Verification Checklist:

Tests pass (or affected tests updated)
Code compiles/builds successfully
No unintended files modified
Changes match the task description
No secrets or credentials exposed

Violation Consequences:

Broken builds
Failed deployments
Rollbacks required
Production incidents

Law 4: Halt When Uncertain

Ask for clarification instead of guessing.

Requirements:

When uncertain, STOP and ask the user
Do not make assumptions about intent
Clarify ambiguous requirements
Confirm when multiple valid approaches exist

Uncertainty Indicators:

"I think..." or "Probably..." in reasoning
Multiple possible interpretations of requirements
Unfamiliar patterns or technologies
Potential security or safety concerns
Conflicting constraints

Halt Conditions:

Modifying unread code
Unclear scope boundaries
Missing rollback procedure
Test/production separation unclear
Three failed attempts at a task
Any safety concern

Violation Consequences:

Wrong implementations
Wasted effort
User frustration
Potential system damage

Universal Application

These laws apply to ALL operations:

File modifications
Git operations
Command execution
Configuration changes
Documentation updates

Pi Enforcement

When running in pi, these laws are enforced automatically by the @architectit/pi-guardrails extension:

Law 1 (Read Before Editing): guardrail_verify_read blocks edits to unread files
Law 2 (Stay in Scope): guardrail_check_scope blocks out-of-scope edits
Law 3 (Verify Before Committing): Output validation auto-redacts secrets; guardrail_check_halt evaluates commit safety
Law 4 (Halt When Uncertain): guardrail_record_attempt/guardrail_check_strikes enforce Three Strikes; injection defense blocks prompt attacks

See [[guardrails-core]] for the full enforcement coverage map.

Task

Apply the Four Laws of Agent Safety to the current operation. Evaluate whether any law is at risk of being violated, enforce compliance, and halt if necessary.

Reference

Full documentation: docs/AGENT_GUARDRAILS.md