From systems-design
Provides patterns for self-service infrastructure including portals, IaC templates with Terraform/Pulumi modules, automated provisioning systems, and guardrails balancing developer autonomy with governance.
npx claudepluginhub melodic-software/claude-code-plugins --plugin systems-designThis skill is limited to using the following tools:
Patterns for enabling developers to provision infrastructure without tickets, while maintaining governance and control.
Structures service repositories with Terraform IaC directories, separates shared platform from service-owned resources, designs module patterns, and eliminates central deployment bottlenecks.
Designs Internal Developer Platforms with self-service portals, templates, provisioning, and golden paths. Useful for platform teams assessing maturity and creating roadmaps.
Sets up production DevOps infrastructure: Docker containerization with Dockerfiles and docker-compose, CI/CD pipelines, Terraform IaC for cloud provisioning, and monitoring. For deploying apps.
Share bugs, ideas, or general feedback.
Patterns for enabling developers to provision infrastructure without tickets, while maintaining governance and control.
Self-Service Infrastructure:
Enabling developers to provision and manage infrastructure
directly, without filing tickets or waiting for ops teams.
Traditional Model:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Developer β Ticket β Ops Review β Manual Provision β Done β
β β
β Timeline: Days to weeks β
β Bottleneck: Ops team capacity β
β Result: Shadow IT, workarounds, frustration β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Self-Service Model:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Developer β Portal/API β Automatic Provision β Done β
β β
β Timeline: Minutes to hours β
β Bottleneck: None (automated) β
β Result: Speed, consistency, compliance β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Self-Service Spectrum:
βββ Fully Managed: Click a button, get a database
βββ Template-Based: Customize from approved templates
βββ Policy-Constrained: Write IaC within guardrails
βββ Full Freedom: Any infrastructure (risky)
Sweet Spot: Template-Based with Policy Guardrails
Self-Service Benefits:
For Developers:
βββ Speed: Minutes instead of days
βββ Autonomy: Provision when needed
βββ Consistency: Same infrastructure every time
βββ Learning: Understand infrastructure better
βββ Ownership: More responsibility, more control
For Operations:
βββ Scale: Handle more requests without more people
βββ Consistency: Enforce standards automatically
βββ Focus: Work on platform, not tickets
βββ Audit: Clear trail of who provisioned what
βββ Compliance: Built-in policy enforcement
For Organization:
βββ Velocity: Faster time to market
βββ Cost: Reduced ops overhead
βββ Governance: Better compliance posture
βββ Security: Consistent security controls
βββ Efficiency: Resources provisioned when needed
Self-Service Infrastructure Architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER INTERFACE β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β β Portal β β CLI β β API β β
β β (Web UI) β β (Terraform) β β (REST/gRPC)β β
β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β
β ββββββββββββββββββΌβββββββββββββββββ β
β β β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ORCHESTRATION LAYER β β
β β βββ Request validation β β
β β βββ Policy evaluation (OPA/Sentinel) β β
β β βββ Cost estimation β β
β β βββ Approval workflow (if needed) β β
β β βββ Execution orchestration β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β TEMPLATE LIBRARY β β
β β βββ Database modules (RDS, Cloud SQL) β β
β β βββ Compute modules (EKS, GKE, VMs) β β
β β βββ Storage modules (S3, GCS) β β
β β βββ Network modules (VPC, subnets) β β
β β βββ Composite modules (full environments) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β EXECUTION ENGINE β β
β β βββ Terraform Cloud/Enterprise β β
β β βββ Pulumi Service β β
β β βββ Crossplane β β
β β βββ Cloud-native (CDK, ARM, Deployment Manager) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CLOUD PROVIDERS β β
β β AWS β GCP β Azure β Kubernetes β Others β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Self-Service Request Flow:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 1. REQUEST β
β Developer: "I need a PostgreSQL database for staging" β
β βββ Via portal, CLI, or API β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 2. VALIDATION β
β βββ User has permission? β Team member β
β βββ Request well-formed? β Valid config β
β βββ Within quotas? β Under team limit β
β βββ Meets policy? β Allowed instance typeβ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 3. ENRICHMENT β
β βββ Apply defaults db.t3.medium β
β βββ Generate names myapp-staging-db β
β βββ Assign network staging-vpc β
β βββ Configure monitoring Datadog integration β
β βββ Estimate cost ~$50/month β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 4. APPROVAL (if required) β
β βββ Auto-approve: staging, dev β Auto-approved β
β βββ Manual approve: production (Would need approval) β
β βββ Cost threshold: >$500/month (Would need approval) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 5. EXECUTION β
β βββ Generate Terraform Based on template β
β βββ Plan Preview changes β
β βββ Apply Create resources β
β βββ Verify Health checks β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 6. DELIVERY β
β βββ Connection string β Vault β
β βββ Notification β Slack/email β
β βββ Documentation β Auto-generated β
β βββ Registration β Service catalog β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Terraform Module Structure:
Organization-Wide Module Library:
terraform-modules/
βββ databases/
β βββ rds-postgres/
β β βββ main.tf
β β βββ variables.tf
β β βββ outputs.tf
β β βββ versions.tf
β β βββ README.md
β β βββ examples/
β β βββ simple/
β β βββ production/
β βββ elasticache-redis/
βββ compute/
β βββ eks-cluster/
β βββ ecs-service/
βββ storage/
β βββ s3-bucket/
βββ network/
βββ vpc/
Module Design Principles:
1. Opinionated Defaults
# variables.tf
variable "instance_class" {
type = string
default = "db.t3.medium" # Sensible default
description = "RDS instance type"
validation {
condition = can(regex("^db\\.(t3|r5|m5)", var.instance_class))
error_message = "Only approved instance families allowed."
}
}
2. Minimal Required Inputs
# Only require what can't be defaulted
variable "name" {
type = string
description = "Database identifier"
}
variable "environment" {
type = string
description = "Environment (dev, staging, prod)"
}
3. Complete Outputs
# outputs.tf
output "endpoint" {
description = "Database connection endpoint"
value = aws_db_instance.main.endpoint
}
output "connection_secret_arn" {
description = "ARN of secret with credentials"
value = aws_secretsmanager_secret.db_credentials.arn
}
4. Built-in Best Practices
# Security hardened by default
resource "aws_db_instance" "main" {
# Encryption always on
storage_encrypted = true
# No public access
publicly_accessible = false
# Automated backups
backup_retention_period = var.environment == "prod" ? 30 : 7
# Enhanced monitoring
monitoring_interval = 60
}
Module Versioning Strategy:
Semantic Versioning:
βββ MAJOR: Breaking changes (new required inputs, removed outputs)
βββ MINOR: New features (new optional inputs, new outputs)
βββ PATCH: Bug fixes (no interface changes)
Version Constraints:
# Allow patch updates automatically
module "database" {
source = "terraform.company.com/modules/rds-postgres"
version = "~> 2.1.0" # >=2.1.0, <2.2.0
}
# Pin to exact version (production)
module "database" {
source = "terraform.company.com/modules/rds-postgres"
version = "= 2.1.3"
}
Deprecation Policy:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Module Version Lifecycle β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Current (v2.x): Supported, new features β
β Previous (v1.x): Supported, security fixes only β
β Deprecated (v0.x): Warning on use, no support β
β Removed: Will not work β
β β
β Notification: β
β βββ Slack announcement when version deprecated β
β βββ Warning in terraform plan output β
β βββ Dashboard showing deprecated module usage β
β βββ Migration guide provided β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Policy as Code Options:
1. HashiCorp Sentinel (Terraform Enterprise)
# Require encryption for all storage
import "tfplan/v2" as tfplan
s3_buckets = filter tfplan.resource_changes as _, rc {
rc.type is "aws_s3_bucket" and
rc.mode is "managed" and
(rc.change.actions contains "create" or
rc.change.actions contains "update")
}
encryption_enabled = rule {
all s3_buckets as _, bucket {
bucket.change.after.server_side_encryption_configuration
is not null
}
}
main = rule { encryption_enabled }
2. Open Policy Agent (OPA)
# Rego policy for Kubernetes
package kubernetes.admission
deny[msg] {
input.request.kind.kind == "Pod"
container := input.request.object.spec.containers[_]
not container.securityContext.runAsNonRoot
msg := "Containers must run as non-root"
}
3. Cloud-Native Policies
# AWS Service Control Policy
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "RequireEncryption",
"Effect": "Deny",
"Action": ["s3:CreateBucket"],
"Resource": "*",
"Condition": {
"StringNotEquals": {
"s3:x-amz-server-side-encryption": "AES256"
}
}
}]
}
Infrastructure Guardrails:
1. Security Guardrails
βββ Encryption required (at-rest, in-transit)
βββ No public access by default
βββ Required security groups
βββ IAM role requirements
βββ Vulnerability scanning
2. Cost Guardrails
βββ Instance type restrictions
βββ Storage size limits
βββ Required cost tags
βββ Budget thresholds
βββ Approval for large resources
3. Compliance Guardrails
βββ Allowed regions (data residency)
βββ Required logging
βββ Backup requirements
βββ Retention policies
βββ Audit trail requirements
4. Operational Guardrails
βββ Naming conventions
βββ Required tags (owner, cost-center)
βββ Resource quotas per team
βββ Monitoring requirements
βββ Deletion protection
Guardrail Implementation:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Guardrail Timing β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Pre-Plan (fastest feedback): β
β βββ Validate terraform files β
β βββ Static analysis (tfsec, checkov) β
β βββ Module version checks β
β β
β Post-Plan (resource-aware): β
β βββ OPA/Sentinel policy evaluation β
β βββ Cost estimation β
β βββ Blast radius assessment β
β β
β Post-Apply (verification): β
β βββ Configuration validation β
β βββ Security scanning β
β βββ Compliance audit β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Environment Provisioning:
Environment Types:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Development Environment β
β βββ Purpose: Individual developer testing β
β βββ Lifetime: Hours to days β
β βββ Resources: Minimal (smallest instances) β
β βββ Data: Synthetic or anonymized β
β βββ Approval: None (within quota) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Staging Environment β
β βββ Purpose: Integration testing, QA β
β βββ Lifetime: Persistent per service β
β βββ Resources: Production-like (scaled down) β
β βββ Data: Sanitized production subset β
β βββ Approval: None (within quota) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Production Environment β
β βββ Purpose: Live customer traffic β
β βββ Lifetime: Permanent β
β βββ Resources: Full capacity β
β βββ Data: Real customer data β
β βββ Approval: Required (security review) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Environment Template:
# environment/main.tf
module "network" {
source = "../modules/vpc"
environment = var.environment
cidr_block = var.network_cidr
}
module "kubernetes" {
source = "../modules/eks"
environment = var.environment
vpc_id = module.network.vpc_id
node_count = var.environment == "prod" ? 5 : 2
}
module "database" {
source = "../modules/rds"
environment = var.environment
vpc_id = module.network.vpc_id
instance_class = var.environment == "prod" ? "db.r5.xlarge" : "db.t3.medium"
multi_az = var.environment == "prod"
}
module "cache" {
source = "../modules/elasticache"
environment = var.environment
vpc_id = module.network.vpc_id
node_type = var.environment == "prod" ? "cache.r5.large" : "cache.t3.micro"
}
Ephemeral/Preview Environments:
Use Cases:
βββ PR preview environments
βββ Feature branch testing
βββ Demo environments
βββ Load testing environments
βββ Incident reproduction
Lifecycle:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
β PR Created βββΊ Environment Created βββΊ Tests Run β
β β β β β
β β βΌ βΌ β
β β Preview URL PR Updated β
β β Posted to PR β β
β β β β
β βΌ βΌ β
β PR Merged ββββββββββββββββββββββββΊ Environment Destroyed β
β β
β Timeout: Auto-destroy after 7 days of inactivity β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation:
# .github/workflows/preview.yml
name: Preview Environment
on:
pull_request:
types: [opened, synchronize]
jobs:
deploy-preview:
runs-on: ubuntu-latest
steps:
- name: Create/Update Environment
run: |
terraform workspace select pr-${{ github.event.pull_request.number }} || \
terraform workspace new pr-${{ github.event.pull_request.number }}
terraform apply -auto-approve
- name: Comment Preview URL
uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
body: 'π Preview: https://pr-${{ github.event.pull_request.number }}.preview.company.com'
})
Platform Comparison:
1. Terraform Cloud/Enterprise
βββ Native Terraform experience
βββ Policy as Code (Sentinel)
βββ Private module registry
βββ Cost estimation
βββ Enterprise features (SSO, audit)
2. Pulumi
βββ Real programming languages
βββ Strong typing and IDE support
βββ Policy as Code (CrossGuard)
βββ Automation API
3. Crossplane
βββ Kubernetes-native
βββ GitOps workflow
βββ Composition for modules
βββ Multi-cloud abstraction
4. Backstage + Terraform
βββ Unified developer portal
βββ Software templates
βββ Plugin ecosystem
βββ Service catalog integration
5. Port/Cortex/OpsLevel
βββ Commercial developer portals
βββ Quick to implement
βββ Built-in integrations
βββ Self-service workflows
Selection Criteria:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Factor β Best Fit β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββ€
β Existing Terraform β Terraform Cloud/Enterprise β
β Kubernetes-first β Crossplane β
β Developer portal β Backstage or commercial β
β Programming language β Pulumi β
β Quick start β Commercial (Port, OpsLevel) β
β Maximum control β Build custom β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Cost Management in Self-Service:
1. Cost Visibility
βββ Estimated cost shown before provisioning
βββ Cost tags automatically applied
βββ Per-team/project dashboards
βββ Anomaly detection and alerts
2. Cost Guardrails
βββ Instance type restrictions
βββ Budget thresholds by team
βββ Approval required above threshold
βββ Auto-shutdown of unused resources
3. Cost Optimization
βββ Right-sizing recommendations
βββ Reserved instance suggestions
βββ Spot instance for non-production
βββ Scheduled scaling
Cost Estimation Flow:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Request: PostgreSQL database for staging β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Cost Estimate: β
β βββ Compute (db.t3.medium): $30/month β
β βββ Storage (100GB gp3): $10/month β
β βββ Backup storage: ~$5/month β
β βββ Data transfer: ~$5/month β
β βββββββββ β
β Estimated Total: ~$50/month β
β β
β β Within team budget ($500/month quota) β
β β No approval required β
β β
β [Proceed] [Modify] [Cancel] β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Self-Service Infrastructure Best Practices:
1. Start Small, Expand Gradually
βββ Begin with 2-3 common resources
βββ Add based on demand
βββ Iterate on feedback
βββ Don't try to cover everything day 1
2. Balance Autonomy and Governance
βββ Guardrails not gates
βββ Automate approvals where safe
βββ Clear escalation paths
βββ Trust but verify
3. Optimize for Developer Experience
βββ Minimal required inputs
βββ Sensible defaults
βββ Clear error messages
βββ Fast feedback loops
4. Maintain Module Quality
βββ Automated testing
βββ Documentation requirements
βββ Versioning strategy
βββ Deprecation process
5. Monitor and Improve
βββ Track provisioning success rate
βββ Measure time to provision
βββ Gather user feedback
βββ Identify automation opportunities
6. Handle Edge Cases
βββ What if provisioning fails?
βββ How to handle orphaned resources?
βββ What about existing resources?
βββ How to migrate between versions?
Self-Service Anti-Patterns:
1. "Self-Service Everything"
β Every possible configuration option
β Curated set of approved patterns
2. "Security Theater"
β Manual approvals that don't add value
β Automated policy enforcement
3. "Configuration Explosion"
β 50 parameters per resource
β Sensible defaults with few overrides
4. "Ignore Cost"
β No visibility into provisioned cost
β Cost estimation and budgets
5. "Build vs Buy Wrong"
β Building everything from scratch
β Use existing tools where appropriate
6. "No Escape Hatch"
β Blocking legitimate exceptions
β Process for justified deviations
internal-developer-platform - Platform engineering overviewgolden-paths - Standardized workflowscontainer-orchestration - Kubernetes infrastructureserverless-patterns - Serverless infrastructure