Security best practices for deploying AI/ML models to production environments
Implements defense-in-depth security for AI/ML model deployment. Claude uses this when deploying models to production, triggering pre-deployment vulnerability scans, secure container configuration, and staged rollout with automatic rollback.
/plugin marketplace add pluginagentmarketplace/custom-plugin-ai-red-teaming/plugin install pluginagentmarketplace-ai-red-teaming-plugin@pluginagentmarketplace/custom-plugin-ai-red-teamingThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/deployment-checklist.yamlreferences/DEPLOYMENT-SECURITY.mdscripts/security-checklist.pyDeploy AI/ML models securely with defense-in-depth strategies and zero-trust architecture.
Skill: secure-deployment
Agent: 06-api-security-tester
OWASP: LLM03 (Supply Chain), LLM06 (Excessive Agency)
NIST: Govern, Manage
Use Case: Secure production deployment
Model Training → [Security Scan] → [Signing] → [Encrypted Storage]
↓
[Canary Deploy] ← [Staged Rollout] ← [Integrity Check] ← [Pull]
↓
[Production] → [Continuous Monitoring]
Security Scans:
- model_vulnerability_scan
- dependency_audit
- bias_evaluation
- adversarial_robustness_test
- pii_leak_detection
- license_compliance
- secrets_detection
class PreDeploymentChecker:
def run_all_checks(self, model_path):
results = []
# Dependency audit
results.append(self.audit_dependencies(model_path))
# Secrets detection
results.append(self.scan_for_secrets(model_path))
# PII leak detection
results.append(self.detect_pii_leakage(model_path))
# Adversarial robustness
results.append(self.test_robustness(model_path))
# Bias evaluation
results.append(self.evaluate_bias(model_path))
return results
def audit_dependencies(self, path):
"""Check for vulnerable dependencies"""
vulns = self.dependency_scanner.scan(path)
critical = [v for v in vulns if v.severity == 'CRITICAL']
if critical:
return CheckResult("dependencies", "FAIL", critical)
return CheckResult("dependencies", "PASS")
def scan_for_secrets(self, path):
"""Detect hardcoded secrets"""
secrets = self.secret_scanner.scan(path)
if secrets:
return CheckResult("secrets", "FAIL", secrets)
return CheckResult("secrets", "PASS")
Container Security:
base_image: distroless/python3
user: nonroot (UID 65532)
filesystem: read-only
capabilities: drop ALL
seccomp: runtime/default
Network Security:
ingress: API gateway only
egress: allowlist only
mtls: required
network_policy: strict
Secrets Management:
provider: HashiCorp Vault
injection: sidecar
rotation: 24 hours
never_in_env: true
Model Storage:
encryption: AES-256-GCM
signing: RSA-4096
integrity: SHA-256 hash
access: RBAC enforced
# Kubernetes deployment security
SECURE_DEPLOYMENT = """
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 65532
fsGroup: 65532
seccompProfile:
type: RuntimeDefault
containers:
- name: model-server
image: distroless/python3:nonroot
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
resources:
limits:
cpu: "4"
memory: "16Gi"
nvidia.com/gpu: "1"
requests:
cpu: "2"
memory: "8Gi"
volumeMounts:
- name: model
mountPath: /model
readOnly: true
- name: tmp
mountPath: /tmp
"""
Isolation:
runtime: gvisor
network: namespace isolated
process: pid namespace
Monitoring:
logging: structured JSON
metrics: Prometheus
tracing: OpenTelemetry
alerts: PagerDuty
Resource Protection:
cpu_limit: enforced
memory_limit: enforced
gpu_memory: enforced
timeout: 30 seconds
class RuntimeProtection:
def __init__(self):
self.timeout = 30 # seconds
self.max_memory = 16 * 1024**3 # 16GB
self.rate_limiter = RateLimiter()
def protected_inference(self, model, input_data, user_id):
# Rate limiting
if not self.rate_limiter.allow(user_id):
raise RateLimitError()
# Timeout protection
with timeout(self.timeout):
# Memory monitoring
with memory_limit(self.max_memory):
result = model.infer(input_data)
# Log the request
self.log_inference(user_id, input_data, result)
return result
Rollout Strategy:
canary:
initial_percentage: 5%
increment: 10%
interval: 1 hour
success_criteria:
- error_rate < 0.1%
- latency_p99 < 5s
- no_security_alerts
rollback:
automatic: true
triggers:
- error_rate > 1%
- security_alert
- latency_p99 > 10s
Pre-Deployment:
- [ ] Dependencies scanned and patched
- [ ] Secrets removed from codebase
- [ ] PII leak testing passed
- [ ] Adversarial robustness validated
- [ ] Model signed and verified
- [ ] Access controls configured
Deployment:
- [ ] Non-root container
- [ ] Read-only filesystem
- [ ] Resource limits set
- [ ] Network policies applied
- [ ] Secrets via vault
- [ ] TLS/mTLS enabled
Runtime:
- [ ] Monitoring enabled
- [ ] Alerting configured
- [ ] Logging comprehensive
- [ ] Rate limiting active
- [ ] Rollback tested
# .github/workflows/secure-deploy.yml
name: Secure Deployment
jobs:
security-scan:
steps:
- name: Dependency Audit
run: pip-audit --strict
- name: Secret Scan
run: gitleaks detect
- name: Container Scan
run: trivy image $IMAGE
- name: SBOM Generation
run: syft $IMAGE -o spdx-json
deploy:
needs: security-scan
steps:
- name: Sign Image
run: cosign sign $IMAGE
- name: Verify Signature
run: cosign verify $IMAGE
- name: Deploy Canary
run: kubectl apply -f canary.yaml
CRITICAL:
- Secrets in codebase
- Critical vulnerabilities
- No authentication
HIGH:
- Root container
- Missing encryption
- No rate limiting
MEDIUM:
- Missing resource limits
- Incomplete logging
- Outdated dependencies
LOW:
- Non-optimal configs
- Missing SBOM
Issue: Deployment failing security scan
Solution: Update dependencies, remove secrets, fix configs
Issue: Container won't start (read-only FS)
Solution: Use tmpfs for temp files, volume for model
Issue: High latency after security layers
Solution: Optimize validation, use caching, async logging
| Component | Purpose |
|---|---|
| Agent 06 | Security testing |
| Agent 08 | CI/CD automation |
| /test api | Pre-deploy testing |
| ArgoCD | GitOps deployment |
Deploy AI models securely with defense-in-depth practices.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.