From coreweave-pack
Manages CoreWeave GPU quotas using kubectl checks and implements Python asyncio queues for inference request throttling.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin coreweave-packThis skill is limited to using the following tools:
CoreWeave limits are primarily GPU quota-based rather than API rate limits. Each namespace has allocated GPU quotas per type.
Deploys GPU workloads on CoreWeave Kubernetes with kubectl: vLLM inference server or batch job. For first GPU deploys, inference testing, cluster access checks.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
CoreWeave limits are primarily GPU quota-based rather than API rate limits. Each namespace has allocated GPU quotas per type.
kubectl describe resourcequota -n my-namespace
kubectl get resourcequota -o json | jq '.items[].status'
import asyncio
from collections import deque
class InferenceQueue:
def __init__(self, max_concurrent: int = 10):
self.semaphore = asyncio.Semaphore(max_concurrent)
self.queue_depth = 0
async def inference(self, client, prompt: str) -> str:
self.queue_depth += 1
async with self.semaphore:
try:
return await asyncio.to_thread(client.generate, prompt)
finally:
self.queue_depth -= 1
For security, see coreweave-security-basics.