Implement monitoring, alerting, and observability with CloudWatch
Creates CloudWatch alarms and dashboards for AWS resources. Use when you need to monitor metrics like CPU, memory, or custom application data and set up alerting based on thresholds.
/plugin marketplace add pluginagentmarketplace/custom-plugin-aws/plugin install pluginagentmarketplace-aws-cloud-assistant@pluginagentmarketplace/custom-plugin-awsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
assets/alarm-config.yamlassets/config.yamlassets/schema.jsonreferences/GUIDE.mdreferences/PATTERNS.mdscripts/validate.pySet up comprehensive monitoring and alerting for AWS resources.
| Attribute | Value |
|---|---|
| AWS Service | CloudWatch |
| Complexity | Medium |
| Est. Time | 15-30 min |
| Prerequisites | Resources to monitor |
| Parameter | Type | Description | Validation |
|---|---|---|---|
| namespace | string | Metric namespace | AWS/* or custom |
| metric_name | string | Metric name | Valid metric |
| resource_id | string | Resource identifier | Valid ARN or ID |
| Parameter | Type | Default | Description |
|---|---|---|---|
| period | int | 300 | Evaluation period (seconds) |
| statistic | string | Average | Average, Sum, Min, Max, p99 |
| threshold | float | varies | Alert threshold |
| evaluation_periods | int | 3 | Consecutive periods |
- name: HighCPU
metric: CPUUtilization
threshold: 80
period: 300
evaluation_periods: 3
- name: StatusCheckFailed
metric: StatusCheckFailed
threshold: 1
period: 60
evaluation_periods: 2
- name: HighCPU
metric: CPUUtilization
threshold: 80
- name: HighMemory
metric: MemoryUtilization
threshold: 85
- name: RunningTaskCount
metric: RunningTaskCount
threshold: 1
comparison: LessThan
- name: HighCPU
metric: CPUUtilization
threshold: 80
- name: LowFreeStorage
metric: FreeStorageSpace
threshold: 10737418240 # 10GB
comparison: LessThan
- name: HighConnections
metric: DatabaseConnections
threshold: 100
aws cloudwatch put-metric-alarm \
--alarm-name prod-ec2-high-cpu \
--alarm-description "EC2 CPU > 80% for 15 minutes" \
--namespace AWS/EC2 \
--metric-name CPUUtilization \
--dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 3 \
--alarm-actions arn:aws:sns:us-east-1:123456789012:alerts \
--ok-actions arn:aws:sns:us-east-1:123456789012:alerts \
--treat-missing-data notBreaching
{
"widgets": [
{
"type": "metric",
"properties": {
"title": "EC2 CPU Utilization",
"metrics": [
["AWS/EC2", "CPUUtilization", "InstanceId", "i-xxx"]
],
"period": 300,
"stat": "Average",
"region": "us-east-1"
}
},
{
"type": "metric",
"properties": {
"title": "ECS Service Memory",
"metrics": [
["AWS/ECS", "MemoryUtilization", "ServiceName", "my-service"]
]
}
}
]
}
import boto3
cloudwatch = boto3.client('cloudwatch')
# Publish custom metric
cloudwatch.put_metric_data(
Namespace='MyApp',
MetricData=[
{
'MetricName': 'RequestLatency',
'Dimensions': [
{'Name': 'Service', 'Value': 'API'},
{'Name': 'Environment', 'Value': 'prod'}
],
'Value': 150.5,
'Unit': 'Milliseconds'
}
]
)
fields @timestamp, @message
| filter @message like /ERROR/
| stats count() as error_count by bin(5m)
fields @timestamp, latency
| stats avg(latency) as avg_latency,
pct(latency, 95) as p95_latency,
pct(latency, 99) as p99_latency
by bin(1h)
fields @timestamp, @message
| filter @message like /Exception|Error/
| parse @message /(?<error_type>\w+Exception)/
| stats count() as count by error_type
| sort count desc
| limit 10
| Symptom | Cause | Solution |
|---|---|---|
| No data | Metric not emitting | Check CloudWatch Agent |
| Alarm stuck | Insufficient data | Check treat_missing_data |
| Dashboard empty | Wrong namespace | Verify metric source |
| High costs | Too many metrics | Use metric filters |
def test_cloudwatch_alarm():
# Arrange
alarm_name = "test-alarm"
# Act
cw.put_metric_alarm(
AlarmName=alarm_name,
MetricName='CPUUtilization',
Namespace='AWS/EC2',
Statistic='Average',
Period=300,
EvaluationPeriods=1,
Threshold=80,
ComparisonOperator='GreaterThanThreshold'
)
# Assert
response = cw.describe_alarms(AlarmNames=[alarm_name])
assert len(response['MetricAlarms']) == 1
# Cleanup
cw.delete_alarms(AlarmNames=[alarm_name])
assets/alarm-config.yaml - Common alarm configurationsThis skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.