Securing AI/ML infrastructure including model storage, API endpoints, and compute resources
Secures AI/ML infrastructure by hardening API endpoints, protecting model storage, and isolating compute resources. Claude uses this when deploying models or reviewing infrastructure configurations to prevent unauthorized access and resource abuse.
/plugin marketplace add pluginagentmarketplace/custom-plugin-ai-red-teaming/plugin install pluginagentmarketplace-ai-red-teaming-plugin@pluginagentmarketplace/custom-plugin-ai-red-teamingThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/security-baseline.yamlreferences/HARDENING-GUIDE.mdscripts/audit-infrastructure.shProtect AI/ML infrastructure from attacks targeting model storage, APIs, and compute resources.
Skill: infrastructure-security
Agent: 06-api-security-tester
OWASP: LLM03 (Supply Chain), LLM10 (Unbounded Consumption)
NIST: Govern, Manage
Use Case: Secure AI deployment infrastructure
[External Threats]
↓
[API Gateway] → [Load Balancer] → [Inference Servers]
↓ ↓ ↓
[Rate Limit] [DDoS Protection] [Model Storage]
↓ ↓ ↓
[Auth/AuthZ] [TLS Termination] [Secrets Manager]
Authentication:
methods:
- API keys (rotation: 90 days)
- OAuth 2.0 / OIDC
- mTLS for service-to-service
requirements:
- Strong key generation
- Secure transmission
- Revocation capability
Rate Limiting:
per_user: 100 req/min
per_ip: 1000 req/min
burst: 50
cost_based: true # Token-aware limiting
Input Validation:
max_length: 4096 tokens
content_type: application/json
schema_validation: strict
encoding: UTF-8 normalized
# API Security Configuration
class APISecurityConfig:
def __init__(self):
self.auth_config = {
'type': 'oauth2',
'token_expiry': 3600,
'refresh_enabled': True,
}
self.rate_limits = {
'default': {'requests': 100, 'window': 60},
'premium': {'requests': 1000, 'window': 60},
'burst_multiplier': 2,
}
self.input_validation = {
'max_tokens': 4096,
'blocked_patterns': self._load_blocked_patterns(),
'sanitization': True,
}
Storage Security:
encryption: AES-256-GCM
access_control: RBAC
audit_logging: enabled
backup: encrypted, offsite
Theft Prevention:
query_limits: 10000/day per user
output_perturbation: enabled
watermarking: model and output
access_logging: all queries
class ModelProtection:
def __init__(self, model):
self.model = model
self.watermark = self._generate_watermark()
def protected_inference(self, input_data, user_id):
# Log the query
self.log_query(user_id, input_data)
# Check query limits
if self.exceeds_limit(user_id):
raise RateLimitError("Query limit exceeded")
# Run inference
output = self.model(input_data)
# Add output perturbation (anti-extraction)
output = self.add_perturbation(output)
# Apply watermark
output = self.apply_watermark(output)
return output
Network Configuration:
internal_only: true
vpc_isolation: enabled
firewall_rules:
- allow: internal_services
- deny: all_external (except API gateway)
TLS Configuration:
version: "1.3"
cipher_suites: [TLS_AES_256_GCM_SHA384]
certificate_rotation: 90 days
mtls: service_to_service
Container Security:
base_image: distroless
user: non-root
filesystem: read-only
capabilities: minimal
seccomp: enabled
Resource Limits:
cpu: 4 cores max
memory: 16GB max
gpu_memory: 24GB max
disk: ephemeral only
Isolation:
runtime: gvisor
network: namespace isolated
secrets: mounted, not in env
API Layer:
- [ ] Strong authentication (OAuth2/mTLS)
- [ ] Rate limiting implemented
- [ ] Input validation enabled
- [ ] Error messages sanitized
- [ ] Logging comprehensive
Storage Layer:
- [ ] Encryption at rest
- [ ] Access controls configured
- [ ] Audit logging enabled
- [ ] Backup encryption
Network Layer:
- [ ] TLS 1.3 enforced
- [ ] Internal VPC only
- [ ] Firewall rules configured
- [ ] DDoS protection enabled
Compute Layer:
- [ ] Non-root containers
- [ ] Resource limits set
- [ ] Secrets in vault
- [ ] Immutable infrastructure
class InfrastructureSecurityTester:
def test_api_security(self, endpoint):
results = []
# Test authentication bypass
results.append(self.test_auth_bypass(endpoint))
# Test rate limiting
results.append(self.test_rate_limits(endpoint))
# Test input validation
results.append(self.test_input_validation(endpoint))
# Test error handling
results.append(self.test_error_disclosure(endpoint))
return results
def test_auth_bypass(self, endpoint):
payloads = [
{'Authorization': ''},
{'Authorization': 'Bearer invalid'},
{'Authorization': 'Bearer ' + 'a' * 1000},
]
for payload in payloads:
response = requests.get(endpoint, headers=payload)
if response.status_code != 401:
return Finding("auth_bypass", "CRITICAL")
return None
CRITICAL:
- Authentication bypass
- Model theft possible
- Data exposure
HIGH:
- Rate limiting bypassable
- Weak encryption
- Insufficient logging
MEDIUM:
- Missing input validation
- Verbose error messages
- Outdated dependencies
LOW:
- Non-optimal configurations
- Minor policy gaps
Issue: API rate limiting not effective
Solution: Implement token-based limits, add IP reputation
Issue: Model extraction detected
Solution: Lower query limits, add output perturbation
Issue: High latency from security layers
Solution: Optimize validation, use caching, async logging
| Component | Purpose |
|---|---|
| Agent 06 | Security testing |
| Agent 08 | CI/CD security gates |
| /test api | Security scanning |
| SIEM | Security monitoring |
Protect AI infrastructure with defense-in-depth security.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.