Skill

dd-monitors

From pup

Manages Datadog monitors: list, create, update, mute/unmute via pup CLI, plus best practices for thresholds, scoping, recovery, and alert fatigue prevention.

Bash

Python

monitoring

npx claudepluginhub datadog-labs/pup --plugin pup

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Create, manage, and maintain monitors for alerting.

SKILL.md

Similar Skills

datadog-hardened

Manages Datadog monitors, dashboards, metrics, logs, events, and incidents via Python CLI using REST API. Requires DD_API_KEY and DD_APP_KEY.

1 file

agent-memory-hardened

datadog-automation

36.4k

Automates Datadog tasks via Rube MCP/Composio: query metrics/logs, manage monitors/dashboards, create events/downtimes. Useful for observability workflows after connecting toolkit.

antigravity-awesome-skills

datadog-automation

2.7k

Automates Datadog tasks via Rube MCP: query metrics, search logs, manage monitors/dashboards, create events and downtimes. Requires active Datadog connection.

all-skills

Stats

Stars740

Forks68

Last CommitApr 27, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Datadog Monitors

Create, manage, and maintain monitors for alerting.

Prerequisites

This requires Go or the pup binary in your path.

pup - go install github.com/datadog-labs/pup@latest Ensure ~/go/bin is in $PATH.

Quick Start

pup auth login

Common Operations

List Monitors

pup monitors list
pup monitors list --tags "team:platform"
pup monitors search --query "status:Alert"

Get Monitor

pup monitors get <id>

Create Monitor

pup monitors create --file monitor.json

Mute/Unmute

# Mute with duration
pup monitors update 12345 --file monitor-muted.json

# Or mute with specific end time
pup monitors update 12345 --file monitor-muted-until.json

# Unmute
pup monitors update 12345 --file monitor-unmuted.json

⚠️ Monitor Creation Best Practices

1. Avoid Alert Fatigue

Rule	Why
No flapping alerts	Use `last_Xm` not `last_1m`
Meaningful thresholds	Based on SLOs, not guesses
Actionable alerts	If no action needed, don't alert
Include runbook	`@runbook-url` in message

# WRONG - will flap constantly
query = "avg(last_1m):avg:system.cpu.user{*} > 50"  # ❌ Too sensitive

# CORRECT - stable alerting
query = "avg(last_5m):avg:system.cpu.user{env:prod} by {host} > 80"  # ✅ Reasonable window

2. Use Proper Scoping

# WRONG - alerts on everything
query = "avg(last_5m):avg:system.cpu.user{*} > 80"  # ❌ No scope

# CORRECT - scoped to what matters
query = "avg(last_5m):avg:system.cpu.user{env:prod,service:api} by {host} > 80"  # ✅

3. Set Recovery Thresholds

monitor = {
    "query": "avg(last_5m):avg:system.cpu.user{env:prod} > 80",
    "options": {
        "thresholds": {
            "critical": 80,
            "critical_recovery": 70,  # ✅ Prevents flapping
            "warning": 60,
            "warning_recovery": 50
        }
    }
}

4. Include Context in Messages

message = """
## High CPU Alert

Host: {{host.name}}
Current Value: {{value}}
Threshold: {{threshold}}

### Runbook
1. Check top processes: `ssh {{host.name}} 'top -bn1 | head -20'`
2. Check recent deploys
3. Scale if needed

@slack-ops @pagerduty-oncall
"""

⚠️ NEVER Delete Monitors Directly

Use safe deletion workflow (same as dashboards):

def safe_mark_monitor_for_deletion(monitor_id: str, client) -> bool:
    """Mark monitor instead of deleting."""
    monitor = client.get_monitor(monitor_id)
    name = monitor.get("name", "")
    
    if "[MARKED FOR DELETION]" in name:
        print(f"Already marked: {name}")
        return False
    
    new_name = f"[MARKED FOR DELETION] {name}"
    client.update_monitor(monitor_id, {"name": new_name})
    print(f"✓ Marked: {new_name}")
    return True

Monitor Types

Type	Use Case
`metric alert`	CPU, memory, custom metrics
`query alert`	Complex metric queries
`service check`	Agent check status
`event alert`	Event stream patterns
`log alert`	Log pattern matching
`composite`	Combine multiple monitors
`apm`	APM metrics

Audit Monitors

# Find monitors without owners
pup monitors list | jq '.[] | select(.tags | contains(["team:"]) | not) | {id, name}'

# Find noisy monitors (high alert count)
pup monitors list | jq 'sort_by(.overall_state_modified) | .[:10] | .[] | {id, name, status: .overall_state}'

Downtime vs Muting

Use	When
Mute monitor	Quick one-off, < 1 hour
Downtime	Scheduled maintenance, recurring

# Downtime (preferred)
pup downtime create --file downtime.json

Failure Handling

Problem	Fix
Alert not firing	Check query returns data, thresholds
Too many alerts	Increase window, add recovery threshold
No data alerts	Check agent connectivity, metric exists
Auth error	`pup auth refresh`

dd-monitors

Tool Access

Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

dd-monitors

Tool Access

Preview

SKILL.md

Datadog Monitors

Prerequisites

Quick Start

Common Operations

List Monitors

Get Monitor

Create Monitor

Mute/Unmute

⚠️ Monitor Creation Best Practices

1. Avoid Alert Fatigue

2. Use Proper Scoping

3. Set Recovery Thresholds

4. Include Context in Messages

⚠️ NEVER Delete Monitors Directly

Monitor Types

Audit Monitors

Downtime vs Muting

Failure Handling

References

Similar Skills

Help us improve

Datadog Monitors

Prerequisites

Quick Start

Common Operations

List Monitors

Get Monitor

Create Monitor

Mute/Unmute

⚠️ Monitor Creation Best Practices

1. Avoid Alert Fatigue

2. Use Proper Scoping

3. Set Recovery Thresholds

4. Include Context in Messages

⚠️ NEVER Delete Monitors Directly

Monitor Types

Audit Monitors

Downtime vs Muting

Failure Handling

References