Skill

coreweave-webhooks-events

Monitors CoreWeave Kubernetes events, GPU utilization, and inference service health. Tracks pod lifecycles and sends alerts via kubectl and Python scripts.

Kubernetes

Bash

Python

monitoring

devops

npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin coreweave-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEditBash(kubectl:*)

Preview

```bash

SKILL.md

Similar Skills

coreweave-observability

1.9k

Sets up GPU monitoring for CoreWeave Kubernetes clusters using DCGM exporter metrics and Prometheus alerts for utilization, memory usage, temperature, and inference pod health.

5 tools

coreweave-pack

vastai-observability

1.9k

Collects Vast.ai GPU instance metrics (utilization, costs, status) via CLI, logs to JSONL, and checks alerts for idle GPUs or high temps. Use for cost tracking and observability dashboards.

5 tools

vastai-pack

cache-components

139.3k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

Stats

Parent Repo Stars1854

Parent Repo Forks248

Last CommitMar 22, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

CoreWeave Webhooks & Events

Kubernetes Event Monitoring

# Watch GPU pod events kubectl get events --watch --field-selector=reason=Scheduled,reason=Pulled,reason=Failed # Monitor GPU utilization via exec kubectl exec -it deployment/inference -- nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv -l 5

Prometheus GPU Metrics

# DCGM exporter for GPU metrics (pre-installed on CKS) # Key metrics: # DCGM_FI_DEV_GPU_UTIL - GPU utilization % # DCGM_FI_DEV_FB_USED - GPU memory used # DCGM_FI_DEV_POWER_USAGE - Power draw

Slack Alert Integration

import subprocess, json, requests def check_inference_health(deployment: str, slack_url: str): result = subprocess.run( ["kubectl", "get", "deployment", deployment, "-o", "json"], capture_output=True, text=True, ) deploy = json.loads(result.stdout) ready = deploy["status"].get("readyReplicas", 0) desired = deploy["spec"]["replicas"] if ready < desired: requests.post(slack_url, json={ "text": f"CoreWeave: {deployment} has {ready}/{desired} replicas ready" })

Resources

Next Steps

For performance optimization, see coreweave-performance-tuning.