From cloud-foundation-principles
This skill should be used when the user is designing Terraform modules, wrapping community modules, implementing conditional resource creation, structuring module variables and outputs, setting up pre-commit quality gates, versioning custom modules, building reusable infrastructure components, or reviewing module code for maintainability. Covers ten production-proven module design patterns, quality gate configuration, and version pinning strategies.
npx claudepluginhub oborchers/fractional-cto --plugin cloud-foundation-principlesThis skill uses the workspace's default tool permissions.
A Terraform module is a contract between the team that builds infrastructure and the teams that consume it. Bad modules expose too many knobs, force consumers to understand implementation details, and break silently when upstream dependencies change. Good modules encode organizational standards, provide sensible defaults, validate inputs at the boundary, and evolve on their own release cycle wi...
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
A Terraform module is a contract between the team that builds infrastructure and the teams that consume it. Bad modules expose too many knobs, force consumers to understand implementation details, and break silently when upstream dependencies change. Good modules encode organizational standards, provide sensible defaults, validate inputs at the boundary, and evolve on their own release cycle without breaking consumers.
These ten patterns come from production infrastructure managing multiple environments and hundreds of resources. Every pattern exists because its absence caused an incident, a bottleneck, or a maintenance burden.
Never rewrite what the community has already built and battle-tested. Before building any module, research what already exists -- terraform-aws-modules, terraform-google-modules, and cloud provider-maintained modules (especially from AWS) are production-hardened and cover most use cases. Use prebuilt modules wherever possible. Only build what is absolutely necessary: your domain-specific wrapper logic on top.
# Good: Wrap the community ECS module with domain logic
module "ecs_service" {
source = "terraform-aws-modules/ecs/aws//modules/service"
version = "~> 5.0"
# Bridge: calculate total CPU/memory from container definitions
cpu = sum([for c in var.containers : c.cpu])
memory = sum([for c in var.containers : c.memory])
tags = var.tags
}
# Bad: 400 lines reimplementing ECS service from scratch
resource "aws_ecs_service" "this" { ... }
resource "aws_ecs_task_definition" "this" { ... }
resource "aws_iam_role" "execution" { ... }
resource "aws_iam_role" "task" { ... }
resource "aws_iam_role_policy_attachment" "execution" { ... }
# ...20 more resources that terraform-aws-modules already handles
Why: Community modules handle edge cases you have not encountered yet and receive security patches from hundreds of contributors.
Create a labels module as your first Terraform module. Import it once per project. Use it for every resource name and tag. Never construct names or tags manually. The labels module's interface, naming pattern, and cost center validation are defined in the naming-and-labeling-as-code skill -- this pattern is about why it matters at the module design level.
# Import ONCE per project (interface defined in naming-and-labeling-as-code skill)
module "labels" {
source = "git::https://github.com/myorg/tf-module-labels.git?ref=v1.2.0"
team = "platform"
env = "dev"
name = "backend"
cost_center = "engineering"
scope = "g"
}
# Use EVERYWHERE
locals {
tags = module.labels.tags
prefix = module.labels.prefix
}
resource "aws_s3_bucket" "data" {
bucket = "${local.prefix}data"
tags = local.tags
}
resource "aws_sqs_queue" "events" {
name = "${local.prefix}events"
tags = local.tags
}
Why: Consistent naming across hundreds of resources is impossible to enforce manually. The labels module makes inconsistency structurally impossible.
Provide two mechanisms for disabling resources: a module-level create toggle and per-feature threshold-based disabling.
variable "create" {
type = bool
default = true
}
resource "aws_cloudwatch_metric_alarm" "cpu" {
count = var.create ? 1 : 0
# ...
}
variable "cpu_utilization_threshold" {
type = number
default = 80
description = "CPU alarm threshold in percent. Set to -1 to disable."
}
locals {
create_cpu_alarm = var.cpu_utilization_threshold >= 0
}
resource "aws_cloudwatch_metric_alarm" "cpu" {
count = local.create_cpu_alarm ? 1 : 0
threshold = var.cpu_utilization_threshold
# ...
}
Why: One variable controls both the threshold and whether the resource exists -- more elegant than a sprawl of boolean variables (enable_cpu_alarm, enable_memory_alarm, enable_disk_alarm).
Use HCL's optional() type constructor to provide clean variable interfaces. Required fields are required. Optional fields have sensible defaults. Consumers only specify what differs from the default.
variable "containers" {
type = list(object({
name = string # Required
image = string # Required
cpu = number # Required
memory = number # Required
port = optional(number, 8080) # Default: 8080
health_check_path = optional(string, "/health")
health_check_matcher = optional(string, "200")
desired_count = optional(number, 2)
environment = optional(list(object({
name = string
value = string
})), [])
}))
}
Minimal consumer usage:
module "service" {
source = "git::https://github.com/myorg/tf-module-container-service.git?ref=v2.0.0"
containers = [{
name = "api"
image = "123456789012.dkr.ecr.eu-west-1.amazonaws.com/myapp:abc1234"
cpu = 1024
memory = 2048
# port, health_check_path, health_check_matcher, desired_count all use defaults
}]
}
Why: The fewer fields a consumer must specify, the fewer mistakes they can make. Sensible defaults encode organizational standards silently.
Export outputs as maps keyed by natural identifiers, not array indices. Consumers should never need to know internal ordering.
# Good: Map keyed by container name and port
output "target_groups" {
value = {
for key, tg in aws_lb_target_group.this :
key => {
arn = tg.arn
name = tg.name
port = tg.port
}
}
# Usage: module.service.target_groups["api-8080"].arn
}
# Bad: List output that depends on internal ordering
output "target_group_arns" {
value = aws_lb_target_group.this[*].arn
# Usage: module.service.target_group_arns[0] -- What is index 0?
}
Why: Array indices break when internal ordering changes. Natural keys ("api-8080", "worker-9090") are self-documenting and stable.
Put complex transformations in locals, not inline in resources. This makes logic testable, readable, and reusable across multiple resources.
locals {
# Flatten container ports into target group definitions
target_groups = flatten([
for container in var.containers : [
for port in container.ports : {
key = "${container.name}-${port.container_port}"
name_prefix = substr(replace("${container.name}${port.container_port}", "/[^a-zA-Z0-9]/", ""), 0, 6)
container_name = container.name
container_port = port.container_port
health_check_path = port.health_check_path
}
]
])
# Convert list to map for for_each usage
target_group_map = { for tg in local.target_groups : tg.key => tg }
}
# Resource uses the pre-computed map -- clean and simple
resource "aws_lb_target_group" "this" {
for_each = local.target_group_map
name_prefix = each.value.name_prefix
port = each.value.container_port
# ...
}
Why: Locals separate data transformation from resource declaration -- readable, reusable, debuggable.
Catch errors at terraform plan time, not at apply time or runtime. Use validation blocks with contains() checks and clear error messages.
# Environment validation -- catch invalid values at plan time
variable "env" {
type = string
description = "Deployment environment"
validation {
condition = contains(["dev", "staging", "prod", "security", "log-archive", "sandbox"], var.env)
error_message = "env must be one of: dev, staging, prod, security, log-archive."
}
}
For cost center validation (closed domain lists, enforcement at plan time), see the naming-and-labeling-as-code skill -- it owns the canonical pattern and list.
Why: An invalid cost center caught during plan is a 10-second fix. The same error discovered in a billing report three months later is a multi-day forensic investigation.
When a module covers multiple service types (e.g., alerts for different cloud resources), use submodules that share an identical variable interface.
tf-module-alerts/
+-- main.tf <-- Entry point
+-- ec2/ <-- Standard interface + EC2-specific thresholds
+-- rds/ <-- Standard interface + RDS-specific thresholds
+-- alb/ <-- Standard interface + ALB-specific thresholds
+-- cache/ <-- Standard interface + cache-specific thresholds
Every submodule accepts the same base variables: create, name, tags, notification_emails, evaluation_periods, period. Service-specific thresholds are added on top.
Why: Consumers learn the pattern once. Switching from EC2 alerts to container alerts requires no new interface learning.
Modules that deploy compute resources should automatically inject infrastructure-level environment variables. Consumers should not need to remember which variables their containers need for the platform to function.
locals {
# Module automatically adds infrastructure env vars
container_definitions = [
for container in var.containers : merge(container, {
environment = concat(
coalesce(container.environment, []), # User-provided vars
[
{ name = "SERVICE_NAME", value = container.name },
{ name = "ENVIRONMENT", value = var.env },
{ name = "LOG_LEVEL", value = var.env == "prod" ? "warn" : "debug" },
{ name = "OTEL_ENDPOINT", value = var.observability_endpoint },
]
)
})
]
}
# Consumer only specifies business-level env vars
module "service" {
source = "git::https://github.com/myorg/tf-module-container-service.git?ref=v2.0.0"
containers = [{
name = "api"
image = "123456789012.dkr.ecr.eu-west-1.amazonaws.com/myapp:abc1234"
cpu = 1024
memory = 2048
environment = [
{ name = "DATABASE_URL", value = "postgres://..." },
]
# SERVICE_NAME, ENVIRONMENT, LOG_LEVEL, OTEL_ENDPOINT are injected automatically
}]
}
Why: Infrastructure concerns (observability, environment identification, logging) are injected by the module. Consumers focus on business configuration. Forgetting to set OTEL_ENDPOINT in one of fifty services is no longer possible.
Pin everything. No exceptions. No "latest." No unversioned references.
# Custom modules: exact git ref tags
module "labels" {
source = "git::https://github.com/myorg/tf-module-labels.git?ref=v1.2.0"
}
# Community modules: exact version
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.19.0"
}
# Providers: pessimistic constraint
terraform {
required_version = ">= 1.8.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
| Type | Pinning Strategy | Example |
|---|---|---|
| Custom modules | Exact git ref tag | ?ref=v1.3.0 |
| Community modules | Exact version | version = "5.19.0" |
| Providers | Pessimistic constraint | version = "~> 5.0" |
| Terraform itself | Minimum version | required_version = ">= 1.8.0" |
Why: Unpinned modules pull the latest version on every terraform init, introducing silent changes that cause plan diffs, apply failures, or resources being replaced in production.
Every module repository should enforce formatting, linting, and security scanning before code enters the main branch. For the complete .pre-commit-config.yaml configuration, see the tag-based-production-deploys skill.
| Gate | Catches | Example |
|---|---|---|
terraform_fmt | Inconsistent formatting | Tabs vs spaces, trailing whitespace |
terraform_tflint | Deprecated syntax, invalid references, naming violations | Previous-generation instance types, undocumented variables |
terraform_checkov (optional) | Security misconfigurations, compliance violations | Unencrypted S3 buckets, security groups open to 0.0.0.0/0. Can produce false positives -- evaluate whether the signal-to-noise ratio justifies the gate for your codebase. |
| Concept | AWS | GCP | Azure |
|---|---|---|---|
| Community module registry | terraform-aws-modules/* | terraform-google-modules/* | Azure/* on Terraform Registry |
| Module source (git) | git::https://github.com/myorg/module.git?ref=v1.0.0 | Same | Same |
| Module source (registry) | terraform-aws-modules/vpc/aws | terraform-google-modules/network/google | Azure/network/azurerm |
| Variable validation | validation { condition = ... } | Same | Same |
| Pre-commit hooks | pre-commit-terraform | Same | Same |
| Linting | TFLint + AWS ruleset | TFLint + GCP ruleset | TFLint + Azure ruleset |
| Security scanning | Checkov, tfsec | Checkov, tfsec | Checkov, tfsec |
Working implementations in examples/:
examples/wrapper-module.md -- A complete module that wraps a community container service module, adds smart defaults, conditional creation, environment variable injection, and lookup-style outputsexamples/quality-gate-setup.md -- Pre-commit configuration, TFLint rules, and Checkov setup for a Terraform module repositoryWhen designing or reviewing Terraform modules:
count with a create variable or threshold-based disabling (-1)optional() with sensible defaults to minimize required consumer inputlocals, not inline in resource blocksvalidation blocks with clear error messages?ref=v1.3.0)version = "5.19.0")~> 5.0)terraform_fmt and terraform_tflint; terraform_checkov is recommended but optional (evaluate false positive rate for your codebase)