AWS CloudWatch Skill

Set up comprehensive monitoring and alerting for AWS resources.

Quick Reference

Attribute	Value
AWS Service	CloudWatch
Complexity	Medium
Est. Time	15-30 min
Prerequisites	Resources to monitor

Parameters

Required

Parameter	Type	Description	Validation
namespace	string	Metric namespace	AWS/* or custom
metric_name	string	Metric name	Valid metric
resource_id	string	Resource identifier	Valid ARN or ID

Optional

Parameter	Type	Default	Description
period	int	300	Evaluation period (seconds)
statistic	string	Average	Average, Sum, Min, Max, p99
threshold	float	varies	Alert threshold
evaluation_periods	int	3	Consecutive periods

Essential Alarms

EC2 Alarms

- name: HighCPU
  metric: CPUUtilization
  threshold: 80
  period: 300
  evaluation_periods: 3

- name: StatusCheckFailed
  metric: StatusCheckFailed
  threshold: 1
  period: 60
  evaluation_periods: 2

ECS Alarms

- name: HighCPU
  metric: CPUUtilization
  threshold: 80

- name: HighMemory
  metric: MemoryUtilization
  threshold: 85

- name: RunningTaskCount
  metric: RunningTaskCount
  threshold: 1
  comparison: LessThan

RDS Alarms

- name: HighCPU
  metric: CPUUtilization
  threshold: 80

- name: LowFreeStorage
  metric: FreeStorageSpace
  threshold: 10737418240  # 10GB
  comparison: LessThan

- name: HighConnections
  metric: DatabaseConnections
  threshold: 100

Implementation

Create Alarm

aws cloudwatch put-metric-alarm \
  --alarm-name prod-ec2-high-cpu \
  --alarm-description "EC2 CPU > 80% for 15 minutes" \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 3 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
  --ok-actions arn:aws:sns:us-east-1:123456789012:alerts \
  --treat-missing-data notBreaching

Dashboard Template

{
  "widgets": [
    {
      "type": "metric",
      "properties": {
        "title": "EC2 CPU Utilization",
        "metrics": [
          ["AWS/EC2", "CPUUtilization", "InstanceId", "i-xxx"]
        ],
        "period": 300,
        "stat": "Average",
        "region": "us-east-1"
      }
    },
    {
      "type": "metric",
      "properties": {
        "title": "ECS Service Memory",
        "metrics": [
          ["AWS/ECS", "MemoryUtilization", "ServiceName", "my-service"]
        ]
      }
    }
  ]
}

Custom Metrics

import boto3

cloudwatch = boto3.client('cloudwatch')

# Publish custom metric
cloudwatch.put_metric_data(
    Namespace='MyApp',
    MetricData=[
        {
            'MetricName': 'RequestLatency',
            'Dimensions': [
                {'Name': 'Service', 'Value': 'API'},
                {'Name': 'Environment', 'Value': 'prod'}
            ],
            'Value': 150.5,
            'Unit': 'Milliseconds'
        }
    ]
)

Log Insights Queries

Error Rate

fields @timestamp, @message
| filter @message like /ERROR/
| stats count() as error_count by bin(5m)

Latency Analysis

fields @timestamp, latency
| stats avg(latency) as avg_latency,
        pct(latency, 95) as p95_latency,
        pct(latency, 99) as p99_latency
  by bin(1h)

Top Errors

fields @timestamp, @message
| filter @message like /Exception|Error/
| parse @message /(?<error_type>\w+Exception)/
| stats count() as count by error_type
| sort count desc
| limit 10

Troubleshooting

Common Issues

Symptom	Cause	Solution
No data	Metric not emitting	Check CloudWatch Agent
Alarm stuck	Insufficient data	Check treat_missing_data
Dashboard empty	Wrong namespace	Verify metric source
High costs	Too many metrics	Use metric filters

Debug Checklist

CloudWatch Agent installed and running?
IAM role allows cloudwatch:PutMetricData?
Correct namespace and dimensions?
Metric has data in expected period?
Alarm threshold reasonable?
SNS topic has subscriptions?

Test Template

def test_cloudwatch_alarm():
    # Arrange
    alarm_name = "test-alarm"

    # Act
    cw.put_metric_alarm(
        AlarmName=alarm_name,
        MetricName='CPUUtilization',
        Namespace='AWS/EC2',
        Statistic='Average',
        Period=300,
        EvaluationPeriods=1,
        Threshold=80,
        ComparisonOperator='GreaterThanThreshold'
    )

    # Assert
    response = cw.describe_alarms(AlarmNames=[alarm_name])
    assert len(response['MetricAlarms']) == 1

    # Cleanup
    cw.delete_alarms(AlarmNames=[alarm_name])

Assets

assets/alarm-config.yaml - Common alarm configurations

aws-cloudwatch

AWS CloudWatch Skill

Quick Reference

Parameters

Required

Optional

Essential Alarms

EC2 Alarms

ECS Alarms

RDS Alarms

Implementation

Create Alarm

Dashboard Template

Custom Metrics

Log Insights Queries

Error Rate

Latency Analysis

Top Errors

Troubleshooting

Common Issues

Debug Checklist

Test Template

Assets

References

Similar Skills