Skill

coreweave-hello-world

Deploys GPU workloads on CoreWeave Kubernetes with kubectl: vLLM inference server or batch job. For first GPU deploys, inference testing, cluster access checks.

Kubernetes

Bash

devops

infrastructure

npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin coreweave-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEditBash(kubectl:*)

Preview

Deploy your first GPU workload on CoreWeave: a simple inference service using vLLM or a batch CUDA job. CoreWeave runs Kubernetes on bare-metal GPU nodes with A100, H100, and L40 GPUs.

SKILL.md

Similar Skills

coreweave-sdk-patterns

1.9k

Provides Python helpers and Kubernetes patterns for CoreWeave GPU workloads, including affinity configs, resource limits, and inference clients for programmatic management.

3 tools

coreweave-pack

vllm-deploy-k8s

Deploys vLLM OpenAI-compatible server to Kubernetes with GPU support, health probes, and services via YAML templates. Checks HF token secret and existing deployments before applying.

2 files

vllm-skills

cache-components

139.3k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

Stats

Parent Repo Stars1854

Parent Repo Forks248

Last CommitMar 22, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

CoreWeave Hello World

Overview

Deploy your first GPU workload on CoreWeave: a simple inference service using vLLM or a batch CUDA job. CoreWeave runs Kubernetes on bare-metal GPU nodes with A100, H100, and L40 GPUs.

Prerequisites

Completed coreweave-install-auth setup
kubectl configured with CoreWeave kubeconfig
Namespace with GPU quota

Instructions

Step 1: Deploy a vLLM Inference Server

# vllm-inference.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vllm-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vllm-server
  template:
    metadata:
      labels:
        app: vllm-server
    spec:
      containers:
        - name: vllm
          image: vllm/vllm-openai:latest
          args:
            - "--model"
            - "meta-llama/Llama-3.1-8B-Instruct"
            - "--port"
            - "8000"
          ports:
            - containerPort: 8000
          resources:
            limits:
              nvidia.com/gpu: 1
              memory: 48Gi
              cpu: "8"
            requests:
              nvidia.com/gpu: 1
              memory: 32Gi
              cpu: "4"
          env:
            - name: HUGGING_FACE_HUB_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hf-token
                  key: token
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: gpu.nvidia.com/class
                    operator: In
                    values: ["A100_PCIE_80GB"]
---
apiVersion: v1
kind: Service
metadata:
  name: vllm-server
spec:
  selector:
    app: vllm-server
  ports:
    - port: 8000
      targetPort: 8000
  type: ClusterIP

# Create HuggingFace token secret
kubectl create secret generic hf-token --from-literal=token="${HF_TOKEN}"

# Deploy
kubectl apply -f vllm-inference.yaml
kubectl get pods -w  # Wait for Running state

# Port-forward and test
kubectl port-forward svc/vllm-server 8000:8000 &
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "messages": [{"role": "user", "content": "Hello!"}]}'

Step 2: Batch GPU Job

# gpu-batch-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: gpu-benchmark
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: benchmark
          image: pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime
          command: ["python3", "-c"]
          args:
            - |
              import torch
              print(f"CUDA available: {torch.cuda.is_available()}")
              print(f"GPU: {torch.cuda.get_device_name(0)}")
              x = torch.randn(10000, 10000, device="cuda")
              y = torch.matmul(x, x)
              print(f"Matrix multiply result shape: {y.shape}")
              print("CoreWeave GPU test passed!")
          resources:
            limits:
              nvidia.com/gpu: 1
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: gpu.nvidia.com/class
                    operator: In
                    values: ["A100_PCIE_80GB"]

kubectl apply -f gpu-batch-job.yaml
kubectl logs job/gpu-benchmark --follow

Error Handling

Error	Cause	Solution
Pod stuck Pending	No GPU capacity	Try different GPU type or check quota
`nvidia-smi` not found	Wrong base image	Use NVIDIA CUDA images
OOMKilled	Insufficient GPU memory	Use larger GPU (80GB A100)
Image pull error	Registry auth	Create imagePullSecret

Resources

Next Steps

Proceed to coreweave-local-dev-loop for development workflow setup.