Use when designing infrastructure self-service portals, IaC templates, or automated provisioning systems. Covers Terraform modules, Pulumi, environment provisioning, and infrastructure guardrails.
Design infrastructure self-service portals and IaC templates that let developers provision resources in minutes without tickets. Use this when building reusable Terraform modules, Pulumi components, or automated provisioning systems with policy guardrails.
/plugin marketplace add melodic-software/claude-code-plugins/plugin install systems-design@melodic-softwareThis skill is limited to using the following tools:
Patterns for enabling developers to provision infrastructure without tickets, while maintaining governance and control.
Self-Service Infrastructure:
Enabling developers to provision and manage infrastructure
directly, without filing tickets or waiting for ops teams.
Traditional Model:
┌─────────────────────────────────────────────────────────────┐
│ Developer → Ticket → Ops Review → Manual Provision → Done │
│ │
│ Timeline: Days to weeks │
│ Bottleneck: Ops team capacity │
│ Result: Shadow IT, workarounds, frustration │
└─────────────────────────────────────────────────────────────┘
Self-Service Model:
┌─────────────────────────────────────────────────────────────┐
│ Developer → Portal/API → Automatic Provision → Done │
│ │
│ Timeline: Minutes to hours │
│ Bottleneck: None (automated) │
│ Result: Speed, consistency, compliance │
└─────────────────────────────────────────────────────────────┘
Self-Service Spectrum:
├── Fully Managed: Click a button, get a database
├── Template-Based: Customize from approved templates
├── Policy-Constrained: Write IaC within guardrails
└── Full Freedom: Any infrastructure (risky)
Sweet Spot: Template-Based with Policy Guardrails
Self-Service Benefits:
For Developers:
├── Speed: Minutes instead of days
├── Autonomy: Provision when needed
├── Consistency: Same infrastructure every time
├── Learning: Understand infrastructure better
└── Ownership: More responsibility, more control
For Operations:
├── Scale: Handle more requests without more people
├── Consistency: Enforce standards automatically
├── Focus: Work on platform, not tickets
├── Audit: Clear trail of who provisioned what
└── Compliance: Built-in policy enforcement
For Organization:
├── Velocity: Faster time to market
├── Cost: Reduced ops overhead
├── Governance: Better compliance posture
├── Security: Consistent security controls
└── Efficiency: Resources provisioned when needed
Self-Service Infrastructure Architecture:
┌─────────────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Portal │ │ CLI │ │ API │ │
│ │ (Web UI) │ │ (Terraform) │ │ (REST/gRPC)│ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ └────────────────┼────────────────┘ │
│ │ │
├──────────────────────────┼───────────────────────────────────┤
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ ORCHESTRATION LAYER │ │
│ │ ├── Request validation │ │
│ │ ├── Policy evaluation (OPA/Sentinel) │ │
│ │ ├── Cost estimation │ │
│ │ ├── Approval workflow (if needed) │ │
│ │ └── Execution orchestration │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
├──────────────────────────┼───────────────────────────────────┤
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ TEMPLATE LIBRARY │ │
│ │ ├── Database modules (RDS, Cloud SQL) │ │
│ │ ├── Compute modules (EKS, GKE, VMs) │ │
│ │ ├── Storage modules (S3, GCS) │ │
│ │ ├── Network modules (VPC, subnets) │ │
│ │ └── Composite modules (full environments) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
├──────────────────────────┼───────────────────────────────────┤
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ EXECUTION ENGINE │ │
│ │ ├── Terraform Cloud/Enterprise │ │
│ │ ├── Pulumi Service │ │
│ │ ├── Crossplane │ │
│ │ └── Cloud-native (CDK, ARM, Deployment Manager) │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │
├──────────────────────────┼───────────────────────────────────┤
│ ▼ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ CLOUD PROVIDERS │ │
│ │ AWS │ GCP │ Azure │ Kubernetes │ Others │ │
│ └─────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Self-Service Request Flow:
┌─────────────────────────────────────────────────────────────┐
│ 1. REQUEST │
│ Developer: "I need a PostgreSQL database for staging" │
│ └── Via portal, CLI, or API │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 2. VALIDATION │
│ ├── User has permission? ✓ Team member │
│ ├── Request well-formed? ✓ Valid config │
│ ├── Within quotas? ✓ Under team limit │
│ └── Meets policy? ✓ Allowed instance type│
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 3. ENRICHMENT │
│ ├── Apply defaults db.t3.medium │
│ ├── Generate names myapp-staging-db │
│ ├── Assign network staging-vpc │
│ ├── Configure monitoring Datadog integration │
│ └── Estimate cost ~$50/month │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 4. APPROVAL (if required) │
│ ├── Auto-approve: staging, dev ✓ Auto-approved │
│ ├── Manual approve: production (Would need approval) │
│ └── Cost threshold: >$500/month (Would need approval) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 5. EXECUTION │
│ ├── Generate Terraform Based on template │
│ ├── Plan Preview changes │
│ ├── Apply Create resources │
│ └── Verify Health checks │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ 6. DELIVERY │
│ ├── Connection string → Vault │
│ ├── Notification → Slack/email │
│ ├── Documentation → Auto-generated │
│ └── Registration → Service catalog │
└─────────────────────────────────────────────────────────────┘
Terraform Module Structure:
Organization-Wide Module Library:
terraform-modules/
├── databases/
│ ├── rds-postgres/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ ├── versions.tf
│ │ ├── README.md
│ │ └── examples/
│ │ ├── simple/
│ │ └── production/
│ └── elasticache-redis/
├── compute/
│ ├── eks-cluster/
│ └── ecs-service/
├── storage/
│ └── s3-bucket/
└── network/
└── vpc/
Module Design Principles:
1. Opinionated Defaults
# variables.tf
variable "instance_class" {
type = string
default = "db.t3.medium" # Sensible default
description = "RDS instance type"
validation {
condition = can(regex("^db\\.(t3|r5|m5)", var.instance_class))
error_message = "Only approved instance families allowed."
}
}
2. Minimal Required Inputs
# Only require what can't be defaulted
variable "name" {
type = string
description = "Database identifier"
}
variable "environment" {
type = string
description = "Environment (dev, staging, prod)"
}
3. Complete Outputs
# outputs.tf
output "endpoint" {
description = "Database connection endpoint"
value = aws_db_instance.main.endpoint
}
output "connection_secret_arn" {
description = "ARN of secret with credentials"
value = aws_secretsmanager_secret.db_credentials.arn
}
4. Built-in Best Practices
# Security hardened by default
resource "aws_db_instance" "main" {
# Encryption always on
storage_encrypted = true
# No public access
publicly_accessible = false
# Automated backups
backup_retention_period = var.environment == "prod" ? 30 : 7
# Enhanced monitoring
monitoring_interval = 60
}
Module Versioning Strategy:
Semantic Versioning:
├── MAJOR: Breaking changes (new required inputs, removed outputs)
├── MINOR: New features (new optional inputs, new outputs)
└── PATCH: Bug fixes (no interface changes)
Version Constraints:
# Allow patch updates automatically
module "database" {
source = "terraform.company.com/modules/rds-postgres"
version = "~> 2.1.0" # >=2.1.0, <2.2.0
}
# Pin to exact version (production)
module "database" {
source = "terraform.company.com/modules/rds-postgres"
version = "= 2.1.3"
}
Deprecation Policy:
┌─────────────────────────────────────────────────────────────┐
│ Module Version Lifecycle │
├─────────────────────────────────────────────────────────────┤
│ Current (v2.x): Supported, new features │
│ Previous (v1.x): Supported, security fixes only │
│ Deprecated (v0.x): Warning on use, no support │
│ Removed: Will not work │
│ │
│ Notification: │
│ ├── Slack announcement when version deprecated │
│ ├── Warning in terraform plan output │
│ ├── Dashboard showing deprecated module usage │
│ └── Migration guide provided │
└─────────────────────────────────────────────────────────────┘
Policy as Code Options:
1. HashiCorp Sentinel (Terraform Enterprise)
# Require encryption for all storage
import "tfplan/v2" as tfplan
s3_buckets = filter tfplan.resource_changes as _, rc {
rc.type is "aws_s3_bucket" and
rc.mode is "managed" and
(rc.change.actions contains "create" or
rc.change.actions contains "update")
}
encryption_enabled = rule {
all s3_buckets as _, bucket {
bucket.change.after.server_side_encryption_configuration
is not null
}
}
main = rule { encryption_enabled }
2. Open Policy Agent (OPA)
# Rego policy for Kubernetes
package kubernetes.admission
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not container.securityContext.runAsNonRoot
msg := "Containers must run as non-root"
}
3. Cloud-Native Policies
# AWS Service Control Policy
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "RequireEncryption",
"Effect": "Deny",
"Action": ["s3:CreateBucket"],
"Resource": "*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}]
}
Infrastructure Guardrails:
1. Security Guardrails
├── Encryption required (at-rest, in-transit)
├── No public access by default
├── Required security groups
├── IAM role requirements
└── Vulnerability scanning
2. Cost Guardrails
├── Instance type restrictions
├── Storage size limits
├── Required cost tags
├── Budget thresholds
└── Approval for large resources
3. Compliance Guardrails
├── Allowed regions (data residency)
├── Required logging
├── Backup requirements
├── Retention policies
└── Audit trail requirements
4. Operational Guardrails
├── Naming conventions
├── Required tags (owner, cost-center)
├── Resource quotas per team
├── Monitoring requirements
└── Deletion protection
Guardrail Implementation:
┌─────────────────────────────────────────────────────────────┐
│ Guardrail Timing │
├─────────────────────────────────────────────────────────────┤
│ │
│ Pre-Plan (fastest feedback): │
│ ├── Validate terraform files │
│ ├── Static analysis (tfsec, checkov) │
│ └── Module version checks │
│ │
│ Post-Plan (resource-aware): │
│ ├── OPA/Sentinel policy evaluation │
│ ├── Cost estimation │
│ └── Blast radius assessment │
│ │
│ Post-Apply (verification): │
│ ├── Configuration validation │
│ ├── Security scanning │
│ └── Compliance audit │
│ │
└─────────────────────────────────────────────────────────────┘
Environment Provisioning:
Environment Types:
┌─────────────────────────────────────────────────────────────┐
│ Development Environment │
│ ├── Purpose: Individual developer testing │
│ ├── Lifetime: Hours to days │
│ ├── Resources: Minimal (smallest instances) │
│ ├── Data: Synthetic or anonymized │
│ └── Approval: None (within quota) │
├─────────────────────────────────────────────────────────────┤
│ Staging Environment │
│ ├── Purpose: Integration testing, QA │
│ ├── Lifetime: Persistent per service │
│ ├── Resources: Production-like (scaled down) │
│ ├── Data: Sanitized production subset │
│ └── Approval: None (within quota) │
├─────────────────────────────────────────────────────────────┤
│ Production Environment │
│ ├── Purpose: Live customer traffic │
│ ├── Lifetime: Permanent │
│ ├── Resources: Full capacity │
│ ├── Data: Real customer data │
│ └── Approval: Required (security review) │
└─────────────────────────────────────────────────────────────┘
Environment Template:
# environment/main.tf
module "network" {
source = "../modules/vpc"
environment = var.environment
cidr_block = var.network_cidr
}
module "kubernetes" {
source = "../modules/eks"
environment = var.environment
vpc_id = module.network.vpc_id
node_count = var.environment == "prod" ? 5 : 2
}
module "database" {
source = "../modules/rds"
environment = var.environment
vpc_id = module.network.vpc_id
instance_class = var.environment == "prod" ? "db.r5.xlarge" : "db.t3.medium"
multi_az = var.environment == "prod"
}
module "cache" {
source = "../modules/elasticache"
environment = var.environment
vpc_id = module.network.vpc_id
node_type = var.environment == "prod" ? "cache.r5.large" : "cache.t3.micro"
}
Ephemeral/Preview Environments:
Use Cases:
├── PR preview environments
├── Feature branch testing
├── Demo environments
├── Load testing environments
└── Incident reproduction
Lifecycle:
┌─────────────────────────────────────────────────────────────┐
│ │
│ PR Created ──► Environment Created ──► Tests Run │
│ │ │ │ │
│ │ ▼ ▼ │
│ │ Preview URL PR Updated │
│ │ Posted to PR │ │
│ │ │ │
│ ▼ ▼ │
│ PR Merged ───────────────────────► Environment Destroyed │
│ │
│ Timeout: Auto-destroy after 7 days of inactivity │
│ │
└─────────────────────────────────────────────────────────────┘
Implementation:
# .github/workflows/preview.yml
name: Preview Environment
on:
pull_request:
types: [opened, synchronize]
jobs:
deploy-preview:
runs-on: ubuntu-latest
steps:
- name: Create/Update Environment
run: |
terraform workspace select pr-${{ github.event.pull_request.number }} || \
terraform workspace new pr-${{ github.event.pull_request.number }}
terraform apply -auto-approve
- name: Comment Preview URL
uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
body: '🚀 Preview: https://pr-${{ github.event.pull_request.number }}.preview.company.com'
})
Platform Comparison:
1. Terraform Cloud/Enterprise
├── Native Terraform experience
├── Policy as Code (Sentinel)
├── Private module registry
├── Cost estimation
└── Enterprise features (SSO, audit)
2. Pulumi
├── Real programming languages
├── Strong typing and IDE support
├── Policy as Code (CrossGuard)
└── Automation API
3. Crossplane
├── Kubernetes-native
├── GitOps workflow
├── Composition for modules
└── Multi-cloud abstraction
4. Backstage + Terraform
├── Unified developer portal
├── Software templates
├── Plugin ecosystem
└── Service catalog integration
5. Port/Cortex/OpsLevel
├── Commercial developer portals
├── Quick to implement
├── Built-in integrations
└── Self-service workflows
Selection Criteria:
┌────────────────────────────────────────────────────────────┐
│ Factor │ Best Fit │
├──────────────────────┼─────────────────────────────────────┤
│ Existing Terraform │ Terraform Cloud/Enterprise │
│ Kubernetes-first │ Crossplane │
│ Developer portal │ Backstage or commercial │
│ Programming language │ Pulumi │
│ Quick start │ Commercial (Port, OpsLevel) │
│ Maximum control │ Build custom │
└────────────────────────────────────────────────────────────┘
Cost Management in Self-Service:
1. Cost Visibility
├── Estimated cost shown before provisioning
├── Cost tags automatically applied
├── Per-team/project dashboards
└── Anomaly detection and alerts
2. Cost Guardrails
├── Instance type restrictions
├── Budget thresholds by team
├── Approval required above threshold
└── Auto-shutdown of unused resources
3. Cost Optimization
├── Right-sizing recommendations
├── Reserved instance suggestions
├── Spot instance for non-production
└── Scheduled scaling
Cost Estimation Flow:
┌─────────────────────────────────────────────────────────────┐
│ Request: PostgreSQL database for staging │
├─────────────────────────────────────────────────────────────┤
│ │
│ Cost Estimate: │
│ ├── Compute (db.t3.medium): $30/month │
│ ├── Storage (100GB gp3): $10/month │
│ ├── Backup storage: ~$5/month │
│ └── Data transfer: ~$5/month │
│ ───────── │
│ Estimated Total: ~$50/month │
│ │
│ ✓ Within team budget ($500/month quota) │
│ ✓ No approval required │
│ │
│ [Proceed] [Modify] [Cancel] │
└─────────────────────────────────────────────────────────────┘
Self-Service Infrastructure Best Practices:
1. Start Small, Expand Gradually
├── Begin with 2-3 common resources
├── Add based on demand
├── Iterate on feedback
└── Don't try to cover everything day 1
2. Balance Autonomy and Governance
├── Guardrails not gates
├── Automate approvals where safe
├── Clear escalation paths
└── Trust but verify
3. Optimize for Developer Experience
├── Minimal required inputs
├── Sensible defaults
├── Clear error messages
└── Fast feedback loops
4. Maintain Module Quality
├── Automated testing
├── Documentation requirements
├── Versioning strategy
└── Deprecation process
5. Monitor and Improve
├── Track provisioning success rate
├── Measure time to provision
├── Gather user feedback
└── Identify automation opportunities
6. Handle Edge Cases
├── What if provisioning fails?
├── How to handle orphaned resources?
├── What about existing resources?
└── How to migrate between versions?
Self-Service Anti-Patterns:
1. "Self-Service Everything"
❌ Every possible configuration option
✓ Curated set of approved patterns
2. "Security Theater"
❌ Manual approvals that don't add value
✓ Automated policy enforcement
3. "Configuration Explosion"
❌ 50 parameters per resource
✓ Sensible defaults with few overrides
4. "Ignore Cost"
❌ No visibility into provisioned cost
✓ Cost estimation and budgets
5. "Build vs Buy Wrong"
❌ Building everything from scratch
✓ Use existing tools where appropriate
6. "No Escape Hatch"
❌ Blocking legitimate exceptions
✓ Process for justified deviations
internal-developer-platform - Platform engineering overviewgolden-paths - Standardized workflowscontainer-orchestration - Kubernetes infrastructureserverless-patterns - Serverless infrastructureCreating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.