From coreweave-pack
Deploys AI inference services on CoreWeave Kubernetes using Helm charts and Kustomize for GPU scaling and multi-model setups.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin coreweave-packThis skill is limited to using the following tools:
```yaml
Deploys KServe InferenceService on CoreWeave Kubernetes for GPU ML model serving with vLLM, autoscaling, scale-to-zero, and A100 affinity.
Deploys vLLM OpenAI-compatible server to Kubernetes with GPU support, health probes, and services via YAML templates. Checks HF token secret and existing deployments before applying.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Share bugs, ideas, or general feedback.
# helm/values.yaml
replicaCount: 2
image:
repository: vllm/vllm-openai
tag: latest
gpu:
type: A100_PCIE_80GB
count: 1
memory: 48Gi
model:
name: meta-llama/Llama-3.1-8B-Instruct
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 5
targetConcurrency: 2
helm install my-inference ./helm -f values-prod.yaml
helm upgrade my-inference ./helm -f values-prod.yaml
k8s/
├── base/
│ ├── deployment.yaml
│ ├── service.yaml
│ └── kustomization.yaml
├── overlays/
│ ├── dev/
│ │ ├── gpu-patch.yaml # L40 GPU for dev
│ │ └── kustomization.yaml
│ └── prod/
│ ├── gpu-patch.yaml # A100/H100 for prod
│ ├── replicas-patch.yaml
│ └── kustomization.yaml
kubectl apply -k k8s/overlays/prod/
For event monitoring, see coreweave-webhooks-events.