Skill

aws-containers

Deploys and operates containerized workloads on AWS ECS, Fargate, and ECR. Covers task definitions, services, debugging with ECS Exec, scaling, load balancers, and image management for AWS container optimization.

AWS

Docker

devops

deployment

Popularity

Parent stars

973

Parent forks

100

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/aws-core:aws-containers

User invocable

Model invocable

Inline context

Default effort

Tool Access

This skill is limited to the following tools:

Read

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

| Developer Need | Recommend | Key CLI / CDK |

Supporting Files

references/app-runner-guide.mdreferences/ecr-repository-management.mdreferences/ecs-exec-debugging.mdreferences/ecs-infrastructure-patterns.mdreferences/ecs-logging-and-firelens.mdreferences/ecs-troubleshooting-guide.mdreferences/fargate-service-deployment.mdreferences/fargate-spot.mdreferences/service-scaling-and-updates.mdreferences/task-definition-authoring.md

SKILL.md

231 lines · ~5.2k tokens(exceeds 5k compaction limit)

Stats

LanguagePython

Parent stars973

Parent forks100

MaintenanceExcellent

Last CommitJun 23, 2026

Actions

View Source View Plugin View on GitHub View README

AWS Containers

Service Overview

Developer Need	Recommend	Key CLI / CDK
Simplest container deploy (HTTP app/API, new customers)	ECS Express Mode	`aws ecs create-express-gateway-service`
Web app, worker, batch, scheduled task	ECS on Fargate	`aws ecs create-service` / CDK `ecsPatterns.ApplicationLoadBalancedFargateService`
GPU workloads or >16 vCPU	ECS on EC2	CDK `ecs.Ec2Service`
Store container images	ECR	`aws ecr create-repository`
Web app behind a load balancer	ECS Fargate + ALB	CDK `ecsPatterns.ApplicationLoadBalancedFargateService`
SQS worker scaling on queue depth	ECS Fargate + SQS	CDK `ecsPatterns.QueueProcessingFargateService`
Cron job / scheduled task	ECS Fargate + EventBridge	CDK `ecsPatterns.ScheduledFargateTask`
Service mesh / service-to-service	ECS Service Connect	Configure on ECS service with Cloud Map namespace
Debug a running container	ECS Exec	`aws ecs execute-command --interactive --command "/bin/sh"`

When a developer says "deploy my container" without naming a service: recommend ECS Express Mode for simple HTTP apps (replaces App Runner for new customers). Recommend ECS Fargate for everything else. Never recommend EKS unless they explicitly ask for Kubernetes.

Overview

Provides expertise for building, deploying, and operating containerized workloads using Amazon ECS, AWS Fargate, Amazon ECR, and AWS App Runner.

Recommended setup: Install the AWS MCP server for sandboxed execution, audit logging, and enterprise controls. See: aws.amazon.com/mcp

Without AWS MCP: This skill works with any agent that has AWS CLI access. All commands use standard AWS CLI syntax.

When NOT to use this skill:

Kubernetes or EKS workloads → use the kubernetes skill
CI/CD pipeline setup for container deployments → use the deploy skill
VPC subnet design and security group architecture → use the networking skill
Running code without containers (Lambda, Step Functions) → use the serverless skill

Before executing any commands:

You MUST verify AWS CLI v2 is installed and configured before running commands
You MUST inform the user if required tools (AWS CLI, Docker, Session Manager plugin) are missing
You MUST respect the user's decision to abort at any point

Gotchas

Apply these every time. Each corrects a mistake agents make without explicit instruction.

Fargate CPU/memory must be valid combinations. Arbitrary values cause Invalid 'cpu' setting for task:
- 256 (0.25 vCPU): 512 MiB, 1 GB, 2 GB
- 512 (0.5 vCPU): 1–4 GB (1 GB increments)
- 1024 (1 vCPU): 2–8 GB (1 GB increments)
- 2048 (2 vCPU): 4–16 GB (1 GB increments)
- 4096 (4 vCPU): 8–30 GB (1 GB increments)
- 8192 (8 vCPU): 16–60 GB (4 GB increments)
- 16384 (16 vCPU): 32–120 GB (8 GB increments)
If the user requests an invalid combination, tell them and recommend the nearest valid option. You MUST NOT silently produce an invalid task definition.
Fargate requires awsvpc networking mode — no exceptions. Agents frequently suggest bridge or host mode for Fargate tasks, which causes immediate registration failure. You MUST set networkMode to awsvpc for all Fargate task definitions. On EC2, awsvpc is recommended; bridge is legacy only.
Execution role vs task role — never confuse them. executionRoleArn: ECS agent uses it to pull images, fetch secrets, write logs. taskRoleArn: application code uses it to call AWS APIs. ECS Exec permissions (ssmmessages:*) go on the task role. ECR pull permissions go on the execution role. ecr:GetAuthorizationToken MUST use Resource: "*" (registry-level action).
Secrets are injected at task launch only — no hot-reload. Changed secrets require aws ecs update-service --force-new-deployment. To reference a specific JSON key in Secrets Manager: arn:aws:secretsmanager:region:account:secret:name-hash:json-key:: — the trailing colons are required (they represent empty version-stage and version-id fields). You can also use SSM Parameter Store with valueFrom pointing to the parameter ARN — the execution role needs ssm:GetParameters permission.
ALB deregistration delay defaults to 300s — reduce to 30–60s. This is the #1 cause of slow deployments. Set it on the target group. It SHOULD exceed your longest request duration.
Set healthCheckGracePeriodSeconds on every ECS service behind an ALB. Without it, the ALB marks tasks unhealthy before they're ready, the circuit breaker counts failures, and the deployment rolls back. JVM/Spring Boot apps need 60–120s.
Always enable deployment circuit breaker with rollback. Without it, bad deployments stay "in progress" for 30+ minutes. In CDK: circuitBreaker: { rollback: true } (specifying the property implicitly enables it; enable defaults to true).
Private subnet Fargate tasks need NAT or all four VPC endpoints. Required endpoints: ecr.dkr (interface), ecr.api (interface), s3 (gateway — ECR stores layers in S3), logs (interface — for CloudWatch). The S3 gateway endpoint is the most commonly missed. For ECS Exec, also add ssmmessages.
ECR lifecycle policies evaluate within 24 hours — not immediately. Multi-architecture images referenced by a manifest list cannot be expired until the manifest list is deleted first. Preview before applying: first aws ecr start-lifecycle-policy-preview --repository-name $REPO, then aws ecr get-lifecycle-policy-preview --repository-name $REPO --output json to see which images would be affected.
ECS Exec requires task role permissions, NOT execution role. The task role needs ssmmessages:CreateControlChannel, CreateDataChannel, OpenControlChannel, OpenDataChannel. Tasks launched before enabling enableExecuteCommand do NOT support ECS Exec — force a new deployment. The container image must include the binary specified in --command (e.g., /bin/sh for interactive sessions). For command logging to S3 or CloudWatch Logs, script and cat must also be installed. Fargate platform version MUST be 1.4.0+.
awslogs log driver mode — check your account's default. Per ECS docs, the ECS service defaults to non-blocking mode, which drops logs when the buffer fills. The defaultLogDriverMode account setting can override this per account. For guaranteed log delivery (audit/compliance), explicitly set "mode": "blocking" in logConfiguration.options. Check your effective default: aws ecs list-account-settings --name defaultLogDriverMode --effective-settings --output json.
App Runner VPC connector routes ALL application-initiated outbound traffic through the VPC. (App Runner is sunset — new customers should use ECS Express Mode instead.) Without a NAT gateway, external API calls and AWS service calls from your application code break. App Runner's own managed traffic (pulling images, pushing logs, retrieving secrets) is NOT routed through the VPC and is unaffected. Implement retry logic with backoff for database connections at startup.
For desiredCount=1 zero-downtime deploys: minimumHealthyPercent=100, maximumPercent=200. This requires capacity for 2 tasks during deployment. You MUST NOT set minimumHealthyPercent=0 if zero downtime is required.
502 Bad Gateway from ALB — check in this order: (a) Container not listening on the port in the target group. (b) Container crashing before responding. (c) Task security group doesn't allow inbound from ALB security group on the container port. (d) Health check path returns non-200. (e) Health check timeout exceeds response time.
Fargate platform version: always use LATEST or 1.4.0. Version 1.3.0 is being retired June 15, 2026 and terminated June 30, 2026.
SQS worker scaling: use a custom backlog-per-task metric. Raw ApproximateNumberOfMessagesVisible with target tracking doesn't work because adding tasks doesn't reduce queue depth proportionally. Use custom metric (ApproximateNumberOfMessagesVisible / RunningTaskCount) with target tracking, or use step scaling. CDK QueueProcessingFargateService handles this automatically via scalingSteps. Workers MUST handle SIGTERM gracefully within stopTimeout (default 30s, max 120s on Fargate).
Blue/green deployments: use native ECS blue/green (July 2025+) for new services. Supports all-at-once, canary, and linear traffic shifting (canary/linear added October 2025), plus Service Connect, headless services, EBS volumes, and lifecycle hooks. CodeDeploy blue/green is now legacy — native ECS blue/green has full feature parity.
Container dependency HEALTHY condition requires a health check on the dependency container. Without a configured health check, the dependent container never starts — ECS does not progress it to its next state. If startTimeout is set (max 120s), the dependency times out and the task fails; if not set, the dependent container blocks indefinitely. For init containers, use SUCCESS condition instead.

Quick-Start: CDK Fargate Web App

import * as cdk from 'aws-cdk-lib';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as ecsPatterns from 'aws-cdk-lib/aws-ecs-patterns';

const service = new ecsPatterns.ApplicationLoadBalancedFargateService(this, 'WebApp', {
  taskImageOptions: {
    image: ecs.ContainerImage.fromEcrRepository(repo, 'latest'),
    containerPort: 8080,
    secrets: { DB_PASSWORD: ecs.Secret.fromSecretsManager(dbSecret) },
  },
  cpu: 512,
  memoryLimitMiB: 1024,
  desiredCount: 2,
  publicLoadBalancer: true,
  circuitBreaker: { rollback: true },
  minHealthyPercent: 100,
});

service.targetGroup.setAttribute('deregistration_delay.timeout_seconds', '30');

const scaling = service.service.autoScaleTaskCount({ minCapacity: 2, maxCapacity: 10 });
scaling.scaleOnCpuUtilization('CpuScaling', { targetUtilizationPercent: 70 });

CDK L3 patterns auto-create VPC, cluster, ALB, target group, and security groups. For production, create these separately and pass them in. ApplicationLoadBalancedFargateService defaults to assignPublicIp: false — tasks in public subnets need assignPublicIp: true for internet access, or use private subnets with NAT.

Quick-Start: ECS Exec

# 1. Enable on the service (existing tasks won't support it — force new deployment)
aws ecs update-service --cluster $CLUSTER --service $SERVICE \
  --enable-execute-command --force-new-deployment --output json

# 2. Connect (task role must have ssmmessages:* permissions)
aws ecs execute-command --cluster $CLUSTER --task $TASK_ID \
  --container $CONTAINER --interactive --command "/bin/sh"

If TargetNotConnectedException: wait 30–60s for SSM agent startup, check NAT/VPC endpoint for ssmmessages, verify task role (not execution role) has permissions.

Common Workflows

Use the best available tool for AWS operations (MCP server, AWS CLI, or SDK). The commands below show the AWS CLI form.

Read reference files only when the conversation requires deeper detail.

Read references/task-definition-authoring.md if the user needs to author a task definition, configure CPU/memory, set up networking modes, inject secrets, mount volumes, or configure container dependencies.
Read references/fargate-service-deployment.md if the user needs to deploy a Fargate service behind an ALB, configure health checks, tune deregistration delay, set up path-based routing, or handle private subnet networking.
Read references/ecr-repository-management.md if the user needs ECR lifecycle policies, image scanning, cross-account image pulls, or is debugging image pull errors.
Read references/ecs-exec-debugging.md if the user needs to set up ECS Exec, debug TargetNotConnectedException, configure session logging, or validate ECS Exec prerequisites.
Read references/service-scaling-and-updates.md if the user needs auto-scaling, deployment strategies (rolling, blue/green), circuit breaker configuration, or Service Connect setup.
Read references/app-runner-guide.md if the user has an existing App Runner service, needs to troubleshoot App Runner connectivity, or wants to migrate from App Runner to ECS Express Mode.
Read references/ecs-infrastructure-patterns.md if the user needs CDK or CloudFormation examples for Fargate services, SQS workers, scheduled tasks, EFS volumes, ECS Exec, path-based routing, private subnets, or FireLens.
Read references/ecs-logging-and-firelens.md if the user needs awslogs configuration, FireLens/Fluent Bit setup, multiline log handling, or guaranteed log delivery.
Read references/ecs-troubleshooting-guide.md if the user is debugging task placement failures, OOM kills (exit code 137), health check failures, image pull errors, or networking issues in private subnets.
Read references/fargate-spot.md if the user asks about Fargate Spot pricing, capacity provider strategies, or interruption handling.

Decision Guide: ECS Express Mode vs ECS Fargate

App Runner: Sunset April 30, 2026 — no new customers, no new features. Existing customers should migrate to ECS Express Mode. See App Runner Availability Change.

Factor	ECS Express Mode	ECS Fargate
Setup complexity	Minimal (single API call)	Moderate — task def, service, cluster, ALB
Networking control	Managed (ALB in default VPC)	Full — awsvpc, security groups, subnets
Scaling	Auto (CPU-based)	Configurable target/step scaling
Use when	New simple HTTP app/API, zero infra management	Production services needing VPC, ALB, fine-grained IAM
Limitations	New service, evolving feature set	Most setup required

Default recommendation: Use ECS Fargate for production workloads. Use ECS Express Mode for the simplest path (new customers).

Troubleshooting

CannotPullContainerError

Cause: Task cannot reach ECR. In private subnets, tasks need NAT gateway or VPC endpoints (ecr.api, ecr.dkr, s3 gateway, logs). Fix: Verify route table has a route to NAT gateway or create the required VPC endpoints. Verify the execution role has ecr:GetDownloadUrlForLayer, ecr:BatchGetImage, ecr:GetAuthorizationToken (Resource: "*"). Check security group allows outbound HTTPS (443).

Task failed ELB health checks

Cause: Health check path returns non-200, container not listening on the configured port, or health check grace period too short. Fix: Verify the container responds on the health check path and port. Set healthCheckGracePeriodSeconds to at least 60s (longer for JVM apps). Ensure the security group allows traffic from the ALB security group on the container port.

OutOfMemoryError / exit code 137

Cause: Container exceeded its memory hard limit (SIGKILL). On Fargate, task-level memory is the hard limit. Fix: Increase task-level memory. For JVM apps, use -XX:MaxRAMPercentage=75 instead of fixed -Xmx — this automatically adapts to the container's memory allocation. Check container-level memory (hard limit) vs memoryReservation (soft limit).

AccessDeniedException on AWS API calls from container

Cause: Permissions are on the execution role instead of the task role, or the task role is missing. Fix: Verify the task definition has taskRoleArn set (not just executionRoleArn). Add the required permissions to the task role.

Service stuck deploying / tasks keep restarting

Cause: Deployment circuit breaker not enabled, or health check failing on new tasks. Fix: Enable circuit breaker with rollback. Check service events: aws ecs describe-services --cluster $CLUSTER --services $SERVICE --output json. Check stopped task reasons: aws ecs describe-tasks --cluster $CLUSTER --tasks $TASK_ID --output json.

ECS Exec TargetNotConnectedException

Cause: SSM agent not running, missing task role permissions, or missing VPC endpoint. Fix: Verify enableExecuteCommand is true on the service. Check the task role has SSM permissions. For private subnets, create the ssmmessages VPC endpoint. Verify with aws ecs describe-tasks that ExecuteCommandAgent status is RUNNING.

Error retry classification

Retry	Do NOT retry
ThrottlingException	InvalidParameterException
ServiceUnavailableException	ClientException
ServerException	AccessDeniedException

Security Considerations

You MUST use IAM roles (execution role + task role) — never embed credentials in container images or environment variables
You MUST use Secrets Manager or SSM Parameter Store for sensitive configuration, injected via the secrets field in the task definition
You SHOULD enable ECR image scanning on push for vulnerability detection
You SHOULD use private subnets with NAT gateway or VPC endpoints for production workloads
You MUST enable CloudTrail for ECS API audit logging
You SHOULD configure CloudWatch Container Insights for monitoring
You SHOULD use readonlyRootFilesystem: true in container definitions where possible (note: incompatible with ECS Exec)
You MUST scope task role permissions to specific resources — avoid * wildcards and *FullAccess policies
You MUST confirm with the user before executing destructive operations: --force-new-deployment (replaces all running tasks), delete-service, deregister-task-definition. ECS does not support --dry-run — use the plan-validate-execute pattern: explain what will happen, get confirmation, then execute
You SHOULD use ACM certificates with HTTPS listeners on ALBs fronting ECS services — per ECS network security best practices: "provision certificates for the load balancer using AWS Certificate Manager (ACM)"
You SHOULD avoid logging sensitive data (secrets, PII, tokens) in container stdout/stderr — these flow to CloudWatch Logs via the awslogs driver. If sensitive data may appear in logs, enable CloudWatch Logs encryption with a KMS key
You SHOULD attach an AWS WAF WebACL to internet-facing ALBs for defense in depth against common web exploits
You SHOULD include aws:SourceArn and aws:SourceAccount condition keys in ECR repository policies for cross-account access to prevent confused deputy attacks

aws-containers

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

aws-containers

Popularity

Invocation

Tool Access

Context Preview

Supporting Files

SKILL.md

AWS Containers

Service Overview

Overview

Gotchas

Quick-Start: CDK Fargate Web App

Quick-Start: ECS Exec

Common Workflows

Decision Guide: ECS Express Mode vs ECS Fargate

Troubleshooting

CannotPullContainerError

Task failed ELB health checks

OutOfMemoryError / exit code 137

AccessDeniedException on AWS API calls from container

Service stuck deploying / tasks keep restarting

ECS Exec TargetNotConnectedException

Error retry classification

Security Considerations

Additional Resources

Reused across plugins

Similar Skills

AWS Containers

Service Overview

Overview

Gotchas

Quick-Start: CDK Fargate Web App

Quick-Start: ECS Exec

Common Workflows

Decision Guide: ECS Express Mode vs ECS Fargate

Troubleshooting

CannotPullContainerError

Task failed ELB health checks

OutOfMemoryError / exit code 137

AccessDeniedException on AWS API calls from container

Service stuck deploying / tasks keep restarting

ECS Exec TargetNotConnectedException

Error retry classification

Security Considerations

Additional Resources

Reused across plugins

Similar Skills