Help us improve
Share bugs, ideas, or general feedback.
From build-like-amazon
Enforces CDK/CloudFormation best practices for immutable infrastructure, environment parity, least privilege, tagging, and cost optimization. Use when provisioning or modifying AWS infrastructure.
npx claudepluginhub robisson/build-like-amazon-agent-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/build-like-amazon:infrastructure-as-codeThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Infrastructure as Code (IaC) means every piece of infrastructure—compute, storage, networking, permissions, monitoring—is defined in version-controlled code that can be reviewed, tested, and deployed through the same pipeline as application code. There is no "clicking in the console." There are no snowflake environments. The code IS the infrastructure, and the infrastructure IS the code.
Analyzes Terraform, CloudFormation, and Pulumi configurations for module structure, state management, drift prevention, and security posture.
Infrastructure engineering discipline: infrastructure-as-code principles, deliverable quality standards, environment parity, change management, security posture, observability, incident response, policy-as-code, supply chain integrity, and disaster recovery. Invoke whenever task involves any interaction with infrastructure work — provisioning, configuring, deploying, monitoring, or operating infrastructure systems.
Validates IaC using Terraform, CloudFormation, Pulumi, CDK: runs validation, security policy checks, Infracost cost estimation, and drift detection. Activates on terraform plan or infrastructure review.
Share bugs, ideas, or general feedback.
Infrastructure as Code (IaC) means every piece of infrastructure—compute, storage, networking, permissions, monitoring—is defined in version-controlled code that can be reviewed, tested, and deployed through the same pipeline as application code. There is no "clicking in the console." There are no snowflake environments. The code IS the infrastructure, and the infrastructure IS the code.
Immutable infrastructure means you never patch in place. You replace. A server is never SSH'd into and modified; it is destroyed and rebuilt from the code. This eliminates configuration drift, makes deployments predictable, and ensures every environment is reproducible.
At Amazon, teams own their infrastructure end-to-end. There is no separate "ops team" that provisions resources. The team that builds the service provisions, deploys, monitors, and operates the infrastructure. CDK (Cloud Development Kit) and CloudFormation are the primary tools, with CDK preferred for new projects because it provides type safety, composition, and testing capabilities that raw CloudFormation templates lack.
Infrastructure changes go through the same code review and deployment pipeline as application code. A CDK change that modifies IAM permissions gets the same scrutiny as a code change that handles authentication. In many ways, it deserves MORE scrutiny—a bad IAM policy can expose the entire account.
// CDK Example: Immutable deployment via Auto Scaling Group
const asg = new autoscaling.AutoScalingGroup(this, 'ServiceASG', {
instanceType: ec2.InstanceType.of(ec2.InstanceClass.M5, ec2.InstanceSize.LARGE),
machineImage: ec2.MachineImage.lookup({
name: `service-ami-${props.buildId}`, // New AMI per build
}),
updatePolicy: autoscaling.UpdatePolicy.rollingUpdate({
maxBatchSize: 1,
minInstancesInService: 2,
pauseTime: Duration.minutes(5),
}),
});
Production, staging, and development environments must be structurally identical. They differ only in:
They must NOT differ in:
// CDK Example: Environment parity through parameterization
interface EnvironmentConfig {
readonly envName: string;
readonly instanceCount: number;
readonly instanceType: ec2.InstanceType;
readonly alarmThreshold: number;
readonly retentionDays: number;
}
const environments: Record<string, EnvironmentConfig> = {
dev: {
envName: 'dev',
instanceCount: 1,
instanceType: ec2.InstanceType.of(ec2.InstanceClass.T3, ec2.InstanceSize.MEDIUM),
alarmThreshold: 5000,
retentionDays: 7,
},
staging: {
envName: 'staging',
instanceCount: 2,
instanceType: ec2.InstanceType.of(ec2.InstanceClass.M5, ec2.InstanceSize.LARGE),
alarmThreshold: 2000,
retentionDays: 30,
},
production: {
envName: 'production',
instanceCount: 6,
instanceType: ec2.InstanceType.of(ec2.InstanceClass.M5, ec2.InstanceSize.XLARGE),
alarmThreshold: 1000,
retentionDays: 365,
},
};
// Same stack, different config — structural parity guaranteed
new ServiceStack(app, `Service-${config.envName}`, { config });
Resource: "*" unless there is no alternative.// ❌ WRONG: Overly broad permissions
const badPolicy = new iam.PolicyStatement({
actions: ['dynamodb:*'],
resources: ['*'],
});
// ✅ RIGHT: Least privilege
const goodPolicy = new iam.PolicyStatement({
actions: [
'dynamodb:GetItem',
'dynamodb:PutItem',
'dynamodb:Query',
],
resources: [
table.tableArn,
`${table.tableArn}/index/*`,
],
conditions: {
'ForAllValues:StringEquals': {
'dynamodb:LeadingKeys': ['${aws:PrincipalTag/TenantId}'],
},
},
});
Action: "*" (wildcard actions)Resource: "*" without documented justificationEvery resource must have these tags (enforced by Service Control Policy):
| Tag Key | Description | Example Values |
|---|---|---|
service | Service name (matches repo name) | payment-processing |
environment | Deployment environment | dev, staging, production |
team | Owning team name | payments-core |
cost-center | Finance cost allocation code | CC-12345 |
data-classification | Data sensitivity level | public, internal, confidential, restricted |
managed-by | IaC tool managing this resource | cdk, cloudformation, terraform |
// CDK Example: Enforce tags at the stack level
Tags.of(app).add('service', props.serviceName);
Tags.of(app).add('environment', props.environment);
Tags.of(app).add('team', props.teamName);
Tags.of(app).add('cost-center', props.costCenter);
Tags.of(app).add('managed-by', 'cdk');
// CDK Nag: Fail deployment if tags are missing
Aspects.of(app).add(new TagEnforcementAspect(REQUIRED_TAGS));
service and team tagsaws:ResourceTag conditionsenvironment: dev tagdata-classification tag to verify encryption| Decision | Cost Impact | Guideline |
|---|---|---|
| Instance type | 40-60% of compute cost | Right-size based on actual metrics, not guesses. Start small, scale up. |
| Storage class | 20-50% of storage cost | Use lifecycle policies: hot → warm → cold → archive |
| Reserved vs. On-Demand | 30-60% savings | Reserve steady-state baseline; On-Demand for burst |
| Region selection | 10-30% variation | Use cheapest region that meets latency requirements |
| Data transfer | Often overlooked | Keep data in same AZ/region where possible; use VPC endpoints |
// CDK Example: Cost guardrails
class CostGuardrailAspect implements IAspect {
visit(node: IConstruct) {
// Prevent expensive instance types in non-prod
if (node instanceof ec2.CfnInstance) {
if (props.environment !== 'production') {
const instanceType = node.instanceType;
if (EXPENSIVE_INSTANCE_TYPES.includes(instanceType)) {
Annotations.of(node).addError(
`Instance type ${instanceType} not allowed in ${props.environment}. Use t3/t4g family.`
);
}
}
}
// Ensure DynamoDB tables use on-demand in dev, provisioned in prod
if (node instanceof dynamodb.CfnTable) {
if (props.environment === 'production' && !node.billingMode) {
Annotations.of(node).addWarning(
'Production DynamoDB tables should use PROVISIONED billing with auto-scaling for cost predictability.'
);
}
}
}
}
infrastructure/
├── bin/
│ └── app.ts # Entry point, environment selection
├── lib/
│ ├── constructs/ # Reusable L3 constructs
│ │ ├── monitored-lambda.ts # Lambda + alarms + dashboard
│ │ ├── secure-bucket.ts # S3 + encryption + lifecycle
│ │ └── api-gateway.ts # APIGW + WAF + logging
│ ├── stacks/
│ │ ├── networking-stack.ts # VPC, subnets, security groups
│ │ ├── data-stack.ts # DynamoDB, S3, ElastiCache
│ │ ├── compute-stack.ts # Lambda, ECS, ASG
│ │ └── monitoring-stack.ts # Dashboards, alarms, SNS
│ └── config/
│ ├── dev.ts
│ ├── staging.ts
│ └── production.ts
├── test/
│ ├── unit/ # Snapshot + fine-grained tests
│ └── integration/ # Deployed stack verification
└── cdk.json
// Snapshot test: Detect unintended changes
test('infrastructure matches snapshot', () => {
const app = new App();
const stack = new ServiceStack(app, 'TestStack', { config: testConfig });
expect(Template.fromStack(stack)).toMatchSnapshot();
});
// Fine-grained assertion: Verify specific properties
test('DynamoDB table has encryption enabled', () => {
const template = Template.fromStack(stack);
template.hasResourceProperties('AWS::DynamoDB::Table', {
SSESpecification: {
SSEEnabled: true,
SSEType: 'KMS',
},
});
});
// Security assertion: No public S3 buckets
test('no public S3 buckets', () => {
const template = Template.fromStack(stack);
template.hasResourceProperties('AWS::S3::Bucket', {
PublicAccessBlockConfiguration: {
BlockPublicAcls: true,
BlockPublicPolicy: true,
IgnorePublicAcls: true,
RestrictPublicBuckets: true,
},
});
});
// L3 Construct: Encodes team standards
export class MonitoredService extends Construct {
constructor(scope: Construct, id: string, props: MonitoredServiceProps) {
super(scope, id);
// Creates: ECS service + ALB + auto-scaling + dashboard + alarms + SNS topic
// All with team-standard configurations baked in
// Individual overrides available but defaults are production-ready
}
}
| Intention | Mechanism |
|---|---|
| "I'll define everything in code" | Service Control Policies deny resource creation without managed-by tag; console access is read-only in production |
| "I'll follow least privilege" | CDK Nag rules fail the build on overly broad IAM policies |
| "I'll tag everything" | SCP denies resource creation without mandatory tags |
| "I'll keep costs under control" | Budget alarms auto-notify; cost anomaly detection pages on-call |
| "I'll keep environments consistent" | Same CDK stack with different configs; structural drift detection alerts |
| What They Say | Why It's Wrong | What To Do Instead |
|---|---|---|
| "I'll just make this change in the console quickly" | Console changes cause drift. Drift causes incidents. The next deployment will either fail or overwrite your change. | Make the change in code. Deploy through the pipeline. If it's truly urgent, make the console change AND immediately commit the IaC change. |
| "This is just a dev environment, it doesn't need the same rigor" | Dev environments that differ from production hide bugs until production. Every "works in dev, breaks in prod" incident traces to environment divergence. | Use the same stack with different scale parameters. Dev is smaller but architecturally identical. |
| "Least privilege is too hard to get right upfront" | Broad permissions today become the permissions you're stuck with forever. Tightening later breaks things. Starting broad is technical debt with security interest. | Start with zero permissions. Add each permission as you need it. Use IAM Access Analyzer to identify unused permissions. |
| "We'll add tags later" | Later never comes. Untagged resources are unowned resources. Unowned resources are never cleaned up, never optimized, and never secured. | Enforce tags at creation time via SCP. No tag, no resource. Period. |
| "Reserved instances are a commitment we're not ready for" | On-Demand pricing for stable workloads is paying a 40% premium for flexibility you're not using. | Use Savings Plans (more flexible than RIs). Commit to your minimum steady-state usage. Use On-Demand only for burst. |
* for actions or resources without documented justification