From coreweave-pack
Collect CoreWeave cluster diagnostics for support tickets. Use when preparing a support case, collecting GPU node status, or documenting pod failures. Trigger with phrases like "coreweave debug", "coreweave support", "coreweave diagnostics", "collect coreweave logs".
npx claudepluginhub flight505/skill-forge --plugin coreweave-packThis skill is limited to using the following tools:
Collect GPU node health, Kubernetes pod status, event logs, and API connectivity into a single diagnostic archive for CoreWeave support tickets. This bundle captures cluster-level resource allocation, failed pod logs, GPU device plugin state, and network reachability so support engineers can diagnose infrastructure issues without requesting additional information. Useful when GPU pods are stuck...
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
Collect GPU node health, Kubernetes pod status, event logs, and API connectivity into a single diagnostic archive for CoreWeave support tickets. This bundle captures cluster-level resource allocation, failed pod logs, GPU device plugin state, and network reachability so support engineers can diagnose infrastructure issues without requesting additional information. Useful when GPU pods are stuck pending, inference workloads OOM, or node autoscaling behaves unexpectedly.
#!/bin/bash
set -euo pipefail
BUNDLE="debug-coreweave-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE"
# Environment check
echo "=== CoreWeave Debug Bundle ===" | tee "$BUNDLE/summary.txt"
echo "Generated: $(date -u +%Y-%m-%dT%H:%M:%SZ)" >> "$BUNDLE/summary.txt"
echo "COREWEAVE_API_KEY: ${COREWEAVE_API_KEY:+[SET]}" >> "$BUNDLE/summary.txt"
echo "KUBECONFIG: ${KUBECONFIG:-default}" >> "$BUNDLE/summary.txt"
echo "kubectl: $(kubectl version --client --short 2>/dev/null || echo 'not found')" >> "$BUNDLE/summary.txt"
# API connectivity
HTTP=$(curl -s -o /dev/null -w "%{http_code}" -H "Authorization: Bearer ${COREWEAVE_API_KEY}" \
https://api.coreweave.com/v1/namespaces 2>/dev/null || echo "000")
echo "API Status: HTTP $HTTP" >> "$BUNDLE/summary.txt"
# Cluster state
kubectl get nodes -o wide > "$BUNDLE/nodes.txt" 2>&1 || true
kubectl get pods --all-namespaces -o wide > "$BUNDLE/pods.txt" 2>&1 || true
kubectl get events --sort-by=.lastTimestamp > "$BUNDLE/events.txt" 2>&1 || true
# GPU allocation and device plugin status
kubectl describe nodes | grep -A10 "Allocated resources" > "$BUNDLE/gpu-allocation.txt" 2>&1 || true
kubectl get pods -n kube-system -l k8s-app=nvidia-device-plugin -o wide > "$BUNDLE/gpu-plugin.txt" 2>&1 || true
# Failed pod logs
for pod in $(kubectl get pods --field-selector=status.phase=Failed -o name 2>/dev/null); do
kubectl logs "$pod" --tail=200 > "$BUNDLE/$(basename "$pod")-logs.txt" 2>&1 || true
done
# Rate limit headers
curl -s -D "$BUNDLE/rate-headers.txt" -o /dev/null \
-H "Authorization: Bearer ${COREWEAVE_API_KEY}" \
https://api.coreweave.com/v1/namespaces 2>/dev/null || true
tar -czf "$BUNDLE.tar.gz" "$BUNDLE" && rm -rf "$BUNDLE"
echo "Bundle: $BUNDLE.tar.gz"
tar -xzf debug-coreweave-*.tar.gz
cat debug-coreweave-*/summary.txt # API + env status at a glance
grep -i "error\|fail\|oom" debug-coreweave-*/events.txt # Critical events
cat debug-coreweave-*/gpu-allocation.txt # GPU resource pressure
| Symptom | Check in Bundle | Fix |
|---|---|---|
| GPU pods stuck Pending | gpu-allocation.txt shows 0 allocatable GPUs | Request quota increase or switch to available GPU type |
| OOMKilled on inference pod | events.txt for OOMKilled entries | Increase memory limits in pod spec; check model size vs allocated RAM |
| Node NotReady | nodes.txt status column | Check events.txt for kubelet issues; contact CoreWeave if persistent |
| API returns 401 | summary.txt shows HTTP 401 | Regenerate API key at CoreWeave dashboard; verify COREWEAVE_API_KEY is set |
| NVIDIA device plugin missing | gpu-plugin.txt empty or error | Verify namespace kube-system has device plugin DaemonSet; redeploy if missing |
async function checkCoreWeave(): Promise<void> {
const key = process.env.COREWEAVE_API_KEY;
if (!key) { console.error("[FAIL] COREWEAVE_API_KEY not set"); return; }
const res = await fetch("https://api.coreweave.com/v1/namespaces", {
headers: { Authorization: `Bearer ${key}` },
});
console.log(`[${res.ok ? "OK" : "FAIL"}] API: HTTP ${res.status}`);
const limit = res.headers.get("x-ratelimit-remaining");
if (limit) console.log(`[INFO] Rate limit remaining: ${limit}`);
}
checkCoreWeave();
See coreweave-common-errors for GPU scheduling and Kubernetes troubleshooting patterns.