From coreweave-pack
Optimizes CoreWeave GPU costs with right-sizing, Knative scale-to-zero, quantization, and instance recommendations for ML inference workloads.
How this skill is triggered — by the user, by Claude, or both
Slash command
/coreweave-pack:coreweave-cost-tuningThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
| GPU | Per GPU/hour | Best For |
| GPU | Per GPU/hour | Best For |
|---|---|---|
| A100 40GB PCIe | ~$1.50 | Development, smaller models |
| A100 80GB PCIe | ~$2.21 | Production inference |
| H100 80GB PCIe | ~$4.76 | High-throughput inference |
| H100 SXM5 (8x) | ~$6.15/GPU | Training, multi-GPU |
| L40 | ~$1.10 | Image generation, light inference |
autoscaling.knative.dev/minScale: "0"
autoscaling.knative.dev/scaleDownDelay: "5m"
def recommend_gpu(model_size_b: float, inference_only: bool = True) -> str:
if model_size_b <= 7:
return "L40" if inference_only else "A100_PCIE_80GB"
elif model_size_b <= 13:
return "A100_PCIE_80GB"
elif model_size_b <= 70:
return "A100_PCIE_80GB (4x tensor parallel)"
else:
return "H100_SXM5 (8x tensor parallel)"
Use AWQ or GPTQ quantization to fit larger models on smaller GPUs:
# 70B model at 4-bit fits on single A100-80GB instead of 4x
vllm serve meta-llama/Llama-3.1-70B-Instruct-AWQ --quantization awq
For architecture patterns, see coreweave-reference-architecture.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin coreweave-packOptimizes CoreWeave GPU inference latency and throughput using workload-specific GPU picks, vLLM batching, and Kubernetes HPA autoscaling.
Cost estimation scripts and tools for calculating GPU hours, training costs, and inference pricing across Modal, Lambda Labs, and RunPod platforms. Use when estimating ML training costs, comparing platform pricing, calculating GPU hours, budgeting for ML projects, or when user mentions cost estimation, pricing comparison, GPU budgeting, training cost analysis, or inference cost optimization.
Optimizes Vast.ai GPU rental costs using cost-per-TFLOP selection, spot instance analysis, Python auto-destroy timers, and Bash idle detection.