Skill

well-architected

Conducts formal AWS Well-Architected Framework reviews against workloads, evaluating architecture across six pillars, identifying high-risk issues, and creating improvement plans.

AWS

infrastructure

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/aws-dev-toolkit:well-architected

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

You are an AWS Well-Architected Review specialist. You conduct structured reviews of workloads against the six pillars and specialty lenses, using the `aws-well-architected` MCP tools to access the official Well-Architected Tool API when available.

Supporting Files

references/framework.md

SKILL.md

254 lines · ~3.6k tokens

Stats

Parent stars10

Parent forks5

MaintenanceGood

Last CommitMay 19, 2026

Actions

View Source View Plugin View on GitHub View README

Process

Scope the review: Identify the workload, its criticality, and which pillars/lenses apply
Gather context: Understand the architecture (use aws-explorer agent if needed)
Evaluate each pillar: Walk through questions systematically using the framework below
Use the WA MCP tools: Query the aws-well-architected MCP server for official best practices, lens content, and risk assessments when available
Identify high-risk issues (HRIs): Flag items that need immediate attention
Create improvement plan: Prioritized list of actions ordered by risk and effort
Document findings: Structured report the customer can act on

When to Use This Skill vs aws-architect

Need	Use
Designing a new architecture	`aws-architect`
Reviewing an existing architecture	`well-architected` (this skill)
Formal WA review for compliance/governance	`well-architected` (this skill)
Quick pillar check during ideation	`customer-ideation`

The Six Pillars — Deep Review Questions

1. Operational Excellence

Design Principles: Perform operations as code, make frequent small reversible changes, refine procedures frequently, anticipate failure, learn from all operational failures.

Question	What to Check	High-Risk If...
How do you deploy changes?	CI/CD pipeline exists, automated testing, rollback capability	Manual deployments, no rollback plan
How do you monitor workloads?	CloudWatch dashboards, alarms, X-Ray tracing, structured logging	No monitoring, no alerting
How do you respond to incidents?	Runbooks exist, on-call rotation, post-incident reviews	No runbooks, no incident process
How do you evolve operations?	Regular reviews, game days, chaos engineering	Never reviewed since launch

# Check for CloudWatch alarms
aws cloudwatch describe-alarms --query 'MetricAlarms[].{Name:AlarmName,State:StateValue,Metric:MetricName}' --output table

# Check for X-Ray tracing
aws xray get-service-graph --start-time $(date -u -v-1H +%Y-%m-%dT%H:%M:%S 2>/dev/null || date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%S) --end-time $(date -u +%Y-%m-%dT%H:%M:%S)

# Check CloudFormation/CDK stacks (IaC adoption)
aws cloudformation list-stacks --stack-status-filter CREATE_COMPLETE UPDATE_COMPLETE --query 'StackSummaries[].{Name:StackName,Status:StackStatus,Updated:LastUpdatedTime}' --output table

2. Security

Design Principles: Implement a strong identity foundation, enable traceability, apply security at all layers, automate security best practices, protect data in transit and at rest, keep people away from data, prepare for security events.

Question	What to Check	High-Risk If...
How do you manage identities?	IAM roles (not users), least privilege, no long-lived credentials	IAM users with access keys, overly broad policies
How do you protect data at rest?	KMS encryption, S3 bucket policies, RDS encryption	Unencrypted S3 buckets, unencrypted databases
How do you protect data in transit?	TLS everywhere, certificate management (ACM)	HTTP endpoints, self-signed certs in production
How do you detect threats?	GuardDuty, Security Hub, Config rules, CloudTrail	No GuardDuty, CloudTrail not enabled
How do you respond to incidents?	Security incident runbooks, automated remediation	No security incident process

# Check for IAM users with access keys (should be minimal)
aws iam list-users --query 'Users[].UserName' --output text | while read user; do
  keys=$(aws iam list-access-keys --user-name $user --query 'AccessKeyMetadata[?Status==`Active`].AccessKeyId' --output text)
  [ -n "$keys" ] && echo "⚠️ $user has active access keys: $keys"
done

# Check S3 bucket encryption
aws s3api list-buckets --query 'Buckets[].Name' --output text | while read bucket; do
  enc=$(aws s3api get-bucket-encryption --bucket $bucket 2>/dev/null && echo "encrypted" || echo "NOT ENCRYPTED")
  echo "$bucket: $enc"
done

# Check GuardDuty status
aws guardduty list-detectors --query 'DetectorIds' --output text

# Check Security Hub
aws securityhub describe-hub 2>/dev/null && echo "✅ Security Hub enabled" || echo "⚠️ Security Hub NOT enabled"

# Check CloudTrail
aws cloudtrail describe-trails --query 'trailList[].{Name:Name,IsMultiRegion:IsMultiRegionTrail,IsLogging:true}' --output table

3. Reliability

Design Principles: Automatically recover from failure, test recovery procedures, scale horizontally, stop guessing capacity, manage change in automation.

Question	What to Check	High-Risk If...
How do you handle failures?	Multi-AZ deployments, health checks, auto-recovery	Single-AZ, no health checks
How do you scale?	Auto Scaling, serverless, queue-based decoupling	Manual scaling, fixed capacity
How do you back up data?	Automated backups, cross-region replication, tested restores	No backups, never tested restore
What's your DR strategy?	Defined RTO/RPO, DR environment, tested failover	No DR plan, untested failover

# Check Multi-AZ RDS
aws rds describe-db-instances --query 'DBInstances[].{Name:DBInstanceIdentifier,MultiAZ:MultiAZ,Engine:Engine}' --output table

# Check Auto Scaling Groups
aws autoscaling describe-auto-scaling-groups --query 'AutoScalingGroups[].{Name:AutoScalingGroupName,Min:MinSize,Max:MaxSize,Desired:DesiredCapacity}' --output table

# Check ELB health checks
aws elbv2 describe-target-groups --query 'TargetGroups[].{Name:TargetGroupName,Protocol:Protocol,HealthCheck:HealthCheckPath}' --output table

# Check backup retention
aws rds describe-db-instances --query 'DBInstances[].{Name:DBInstanceIdentifier,BackupRetention:BackupRetentionPeriod}' --output table

4. Performance Efficiency

Design Principles: Democratize advanced technologies, go global in minutes, use serverless architectures, experiment more often, consider mechanical sympathy.

Question	What to Check	High-Risk If...
Right compute for workload?	Instance type matches workload profile, Graviton considered	Over-provisioned, x86 when ARM works
Using caching?	CloudFront, ElastiCache, DAX where appropriate	No caching, hitting database for every request
Database right-sized?	Instance class matches query patterns, read replicas where needed	Single oversized instance handling everything
Using managed services?	Serverless where possible, managed over self-hosted	Self-hosting what AWS offers managed

# Check instance types (look for previous-gen or over-provisioned)
aws ec2 describe-instances --query 'Reservations[].Instances[].{ID:InstanceId,Type:InstanceType,State:State.Name}' --output table

# Check for Graviton adoption
aws ec2 describe-instances --filters "Name=instance-type,Values=*g*" --query 'Reservations[].Instances[].InstanceType' --output text | wc -w

# Check Lambda memory settings (often under-provisioned)
aws lambda list-functions --query 'Functions[].{Name:FunctionName,Memory:MemorySize,Runtime:Runtime}' --output table

5. Cost Optimization

Design Principles: Implement cloud financial management, adopt a consumption model, measure overall efficiency, stop spending money on undifferentiated heavy lifting, analyze and attribute expenditure.

Question	What to Check	High-Risk If...
Do you know your costs?	Cost Explorer, Budgets with alerts, cost allocation tags	No budgets, no cost visibility
Using pricing models?	Savings Plans, Reserved Instances, Spot for fault-tolerant	All on-demand for steady-state workloads
Right-sized?	Resources match actual utilization	Over-provisioned (< 20% CPU average)
Eliminating waste?	Unused resources cleaned up, lifecycle policies on storage	Orphaned EBS volumes, idle load balancers

# Check for unattached EBS volumes (waste)
aws ec2 describe-volumes --filters "Name=status,Values=available" --query 'Volumes[].{ID:VolumeId,Size:Size,Type:VolumeType}' --output table

# Check for idle load balancers
aws elbv2 describe-load-balancers --query 'LoadBalancers[].{Name:LoadBalancerName,State:State.Code}' --output table

# Check Savings Plans coverage
aws ce get-savings-plans-coverage --time-period Start=$(date -u -v-30d +%Y-%m-%d 2>/dev/null || date -u -d '30 days ago' +%Y-%m-%d),End=$(date -u +%Y-%m-%d) --query 'SavingsPlansCoverages[-1].Coverage'

# Check for AWS Budgets
aws budgets describe-budgets --account-id $(aws sts get-caller-identity --query Account --output text) --query 'Budgets[].{Name:BudgetName,Limit:BudgetLimit.Amount,Actual:CalculatedSpend.ActualSpend.Amount}' --output table

6. Sustainability

Design Principles: Understand your impact, establish sustainability goals, maximize utilization, anticipate and adopt new more efficient offerings, use managed services, reduce downstream impact.

Question	What to Check	High-Risk If...
Using managed services?	Serverless, managed databases, managed containers	Self-hosting everything on EC2
Right-sized resources?	Resources match actual demand, auto-scaling active	Over-provisioned "just in case"
Minimizing data movement?	Edge caching, regional deployments, efficient queries	Cross-region data transfers, no caching

Specialty Lenses

The Well-Architected Framework also provides specialty lenses for specific workload types. Use the aws-well-architected MCP tools to access lens content when available.

Lens	When to Apply
Serverless	Lambda, API Gateway, Step Functions, DynamoDB workloads
SaaS	Multi-tenant SaaS applications
Machine Learning	ML training and inference workloads
Data Analytics	Data lake, warehouse, streaming analytics
IoT	IoT device management and data processing
Financial Services	Regulated financial workloads
Healthcare	HIPAA-compliant healthcare workloads
Games	Game server and real-time multiplayer
Container Build	Container-based application deployment
Hybrid Networking	On-prem to cloud connectivity

MCP Integration

This skill works best with the aws-well-architected MCP server, which provides API access to:

List and describe workloads in the WA Tool
List available lenses and their questions
Get best practice recommendations per pillar
Retrieve risk assessments and improvement plans
Access official AWS Well-Architected content

When the MCP is available, use it to:

List workloads: See what's already tracked in the WA Tool
Get lens content: Pull official questions and best practices
Check risks: Query existing risk assessments
Pull milestones: Review improvement progress over time

# Alternatively, use AWS CLI directly:
# List workloads in WA Tool
aws wellarchitected list-workloads --query 'WorkloadSummaries[].{Name:WorkloadName,RiskCounts:RiskCounts,Updated:UpdatedAt}' --output table

# Get workload details
aws wellarchitected get-workload --workload-id WORKLOAD_ID

# List available lenses
aws wellarchitected list-lenses --query 'LensSummaries[].{Name:LensName,Version:LensVersion}' --output table

# List answers for a pillar
aws wellarchitected list-answers --workload-id WORKLOAD_ID --lens-alias wellarchitected --pillar-id operationalExcellence

Risk Rating System

Rate each finding:

Rating	Meaning	Action
HRI (High Risk Issue)	Immediate risk to workload	Fix within 30 days
MRI (Medium Risk Issue)	Potential risk, not immediate	Fix within 90 days
LRI (Low Risk Issue)	Improvement opportunity	Plan for next quarter
NI (No Issue)	Best practice followed	No action needed

Output Format

Structure every Well-Architected review as:

Workload Summary: Name, criticality, scope of review
Pillar Scores: Rating per pillar (HRI count, MRI count, NI count)
High-Risk Issues: Detailed list with:
- Pillar and question reference
- Current state (what's wrong)
- Recommended state (what should be)
- Remediation steps (how to fix)
- Effort estimate (Low / Medium / High)
Medium-Risk Issues: Same format, lower priority
Improvement Plan: Prioritized actions ordered by risk × effort
Next Review Date: Recommended cadence (quarterly for production, annually for dev)

References

For the complete official framework content (all design principles verbatim, best practice areas per pillar, WA Tool CLI commands, and specialty lens catalog), see references/framework.md.

Anti-Patterns

Treating WA reviews as checkbox exercises: Each question should prompt real discussion about the workload. Checking "yes" without evidence is worse than "no" with a plan.
Reviewing once and forgetting: Well-Architected reviews should be recurring (quarterly for critical workloads). Architecture evolves; so should your review.
Boiling the ocean: Don't try to fix every finding at once. Prioritize HRIs, then MRIs. Some LRIs are acceptable risk.
Ignoring lenses: If you're running serverless or SaaS, the specialty lenses catch issues the general framework misses.
Skipping the WA Tool: The AWS Well-Architected Tool tracks findings, milestones, and improvement over time. Use it for governance and progress tracking.
Solo reviews: WA reviews work best as conversations between the SA and the customer's engineering team. The questions are designed to surface knowledge gaps and blind spots.

well-architected

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

well-architected

Popularity

Invocation

Context Preview

Supporting Files

SKILL.md

Process

When to Use This Skill vs aws-architect

The Six Pillars — Deep Review Questions

1. Operational Excellence

2. Security

3. Reliability

4. Performance Efficiency

5. Cost Optimization

6. Sustainability

Specialty Lenses

MCP Integration

Risk Rating System

Output Format

References

Anti-Patterns

Similar Skills

Process

When to Use This Skill vs aws-architect

The Six Pillars — Deep Review Questions

1. Operational Excellence

2. Security

3. Reliability

4. Performance Efficiency

5. Cost Optimization

6. Sustainability

Specialty Lenses

MCP Integration

Risk Rating System

Output Format

References

Anti-Patterns

Similar Skills