From coreweave-pack
Provides Kubernetes reference architecture for CoreWeave GPU cloud: ML model serving with vLLM/TGI, shared PVC storage, autoscaling, monitoring, and project structure.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin coreweave-packThis skill is limited to using the following tools:
```
Provides Python helpers and Kubernetes patterns for CoreWeave GPU workloads, including affinity configs, resource limits, and inference clients for programmatic management.
Provides Vast.ai reference architecture for GPU compute workflows in ML training: three-tier orchestrator-workers-storage, Python job queues, Docker workers, and YAML configs.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Share bugs, ideas, or general feedback.
┌─────────────────────┐
│ Load Balancer │
│ (Ingress/LB) │
└──────────┬──────────┘
│
┌────────────────┼────────────────┐
│ │ │
┌────────▼──────┐ ┌──────▼────────┐ ┌─────▼───────┐
│ Model A │ │ Model B │ │ Model C │
│ (vLLM, A100) │ │ (TGI, H100) │ │ (SD, L40) │
│ 2 replicas │ │ 1 replica │ │ 3 replicas │
└───────────────┘ └───────────────┘ └─────────────┘
│ │ │
┌────────▼────────────────▼────────────────▼───────┐
│ Shared Storage (PVC) │
│ Models / Checkpoints / Data │
└──────────────────────────────────────────────────┘
ml-platform/
├── k8s/
│ ├── base/ # Shared templates
│ ├── models/
│ │ ├── llama-8b/ # Per-model manifests
│ │ ├── llama-70b/
│ │ └── stable-diffusion/
│ └── infra/
│ ├── storage.yaml # PVCs
│ ├── secrets.yaml # Model tokens
│ └── monitoring.yaml # Prometheus rules
├── containers/
│ ├── vllm/Dockerfile
│ └── custom-server/Dockerfile
├── scripts/
│ ├── deploy.sh
│ └── benchmark.sh
└── monitoring/
├── grafana-dashboards/
└── alert-rules.yaml
| Decision | Choice | Rationale |
|---|---|---|
| Serving framework | vLLM | Continuous batching, PagedAttention |
| GPU type (production) | A100 80GB | Best price/performance for inference |
| Storage | Shared PVC (SSD) | Fast model loading across replicas |
| Autoscaling | KServe + Knative | Native scale-to-zero support |
| Container registry | GHCR | GitHub integration, free for public |
For multi-environment setup, see coreweave-multi-env-setup.