Skill

coreweave-reference-architecture

Provides Kubernetes reference architecture for CoreWeave GPU cloud: ML model serving with vLLM/TGI, shared PVC storage, autoscaling, monitoring, and project structure.

Kubernetes

Docker

ai-ml

devops

npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin coreweave-pack

Tool Access

This skill is limited to using the following tools:

ReadWriteEditGrep

Preview

```

SKILL.md

Similar Skills

coreweave-sdk-patterns

1.9k

Provides Python helpers and Kubernetes patterns for CoreWeave GPU workloads, including affinity configs, resource limits, and inference clients for programmatic management.

3 tools

coreweave-pack

vastai-reference-architecture

1.9k

Provides Vast.ai reference architecture for GPU compute workflows in ML training: three-tier orchestrator-workers-storage, Python job queues, Docker workers, and YAML configs.

2 tools

vastai-pack

cache-components

139.3k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

Stats

Parent Repo Stars1854

Parent Repo Forks248

Last CommitMar 22, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

CoreWeave Reference Architecture

Architecture Diagram

                    ┌─────────────────────┐
                    │   Load Balancer     │
                    │   (Ingress/LB)      │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
     ┌────────▼──────┐ ┌──────▼────────┐ ┌─────▼───────┐
     │ Model A       │ │ Model B       │ │ Model C     │
     │ (vLLM, A100)  │ │ (TGI, H100)  │ │ (SD, L40)   │
     │ 2 replicas    │ │ 1 replica     │ │ 3 replicas  │
     └───────────────┘ └───────────────┘ └─────────────┘
              │                │                │
     ┌────────▼────────────────▼────────────────▼───────┐
     │              Shared Storage (PVC)                │
     │         Models / Checkpoints / Data              │
     └──────────────────────────────────────────────────┘

Project Structure

ml-platform/
├── k8s/
│   ├── base/                    # Shared templates
│   ├── models/
│   │   ├── llama-8b/           # Per-model manifests
│   │   ├── llama-70b/
│   │   └── stable-diffusion/
│   └── infra/
│       ├── storage.yaml         # PVCs
│       ├── secrets.yaml         # Model tokens
│       └── monitoring.yaml      # Prometheus rules
├── containers/
│   ├── vllm/Dockerfile
│   └── custom-server/Dockerfile
├── scripts/
│   ├── deploy.sh
│   └── benchmark.sh
└── monitoring/
    ├── grafana-dashboards/
    └── alert-rules.yaml

Key Design Decisions

Decision	Choice	Rationale
Serving framework	vLLM	Continuous batching, PagedAttention
GPU type (production)	A100 80GB	Best price/performance for inference
Storage	Shared PVC (SSD)	Fast model loading across replicas
Autoscaling	KServe + Knative	Native scale-to-zero support
Container registry	GHCR	GitHub integration, free for public

Resources

Next Steps

For multi-environment setup, see coreweave-multi-env-setup.