From wicked-garden
Trust, safety, and control patterns for production agentic systems with human-in-the-loop gates and guardrails. Use when: "agent safety", "guardrails", "human-in-the-loop", "agent trust", "prompt injection defense"
npx claudepluginhub mikeparcewski/wicked-garden --plugin wicked-gardenThis skill uses the workspace's default tool permissions.
Essential patterns for building safe, trustworthy, production-ready agentic systems.
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
Essential patterns for building safe, trustworthy, production-ready agentic systems.
Always require approval for:
Consider approval for:
async def execute_with_approval(action, threshold=0.8):
if action.confidence < threshold or action.is_high_stakes():
approval = await request_human_approval(action)
if not approval.approved:
raise ApprovalDenied(approval.reason)
return await action.execute()
Synchronous Approval: Block until human responds (for urgent decisions) Asynchronous Approval: Queue for later review (for batch operations) Escalation Chains: Route to higher authority if primary approver unavailable Timeout Handling: Define what happens if no approval received
See refs/guardrails-input-output.md, refs/guardrails-actions.md, and refs/guardrails-resources.md for detailed implementation patterns.
Force outputs into validated schemas using Pydantic or similar.
Check outputs before acting on them:
Cross-Validation: Multiple agents check same fact Source Verification: Verify claims against ground truth Confidence Thresholds: Reject low-confidence outputs Fact Checking: Use retrieval to verify factual claims
See refs/guardrails-input-output.md for code examples.
Safer than blacklisting. Define allowed commands/actions explicitly.
Isolate agent execution:
Prevent runaway resource usage:
See refs/guardrails-actions.md and refs/guardrails-resources.md for implementation details.
Clean user inputs before passing to LLM. Remove instruction-like patterns.
Use clear delimiters to separate system instructions from user input.
Separate instruction and data contexts using role-based message formatting.
See refs/guardrails-input-output.md for defense patterns and code examples.
Regex patterns for email, SSN, credit cards, phone numbers, etc.
Replace detected PII with [REDACTED_TYPE] tokens.
See refs/guardrails-input-output.md for detection and redaction code.
Retrieval-Augmented Generation (RAG): Retrieve facts before generation Citation Requirements: Require source citations for all claims
Multi-Agent Verification: Independent verification by multiple agents Confidence Calibration: Require confidence scores, reject low-confidence outputs
See refs/guardrails-actions.md and refs/guardrails-resources.md for implementation patterns.
Kill Switch: Emergency stop that halts all operations and alerts administrators. Circuit Breaker: Opens circuit after threshold failures to prevent cascading failures. Rate Limiting: Limits requests per user/time window to prevent abuse.
See refs/guardrails-actions.md and refs/guardrails-resources.md for complete implementations.
See refs/safety-checklist-core.md for the full pre-deployment checklist covering human gates, validation, whitelisting, resource limits, PII, prompt injection, hallucination, circuit breakers, rate limiting, kill switches, audit logging, and rollback. See refs/safety-checklist-advanced.md for monitoring, incidents, testing, and ops checklists.
refs/safety-checklist-core.md - Core safety checklist (input, output, action, auth, privacy)refs/safety-checklist-advanced.md - Advanced safety checklist (monitoring, incidents, testing, ops)refs/guardrails-input-output.md - Input validation, sanitization, prompt injection, output filteringrefs/guardrails-actions.md - Action whitelisting, approvals, sandboxed executionrefs/guardrails-resources.md - Resource limiting, monitoring, complete guardrail architecture