GPU CLI
Run any code on cloud GPUs with a single command. Just prefix your normal commands with gpu run.
python train.py # local
gpu run python train.py # remote GPU
Features
- Simple - Prefix commands with
gpu run, that's it
- Fast - Connection pooling, delta sync, real-time output streaming
- Cost-efficient - Auto-stops pods when idle (save 60-98% on GPU costs)
- Multi-cloud - RunPod, Vast.ai, local Docker
- Secure - Zero-trust encryption on supported providers
- Teams — Organizations with pooled sessions, sub-accounts, and CI/CD service tokens (Team & Enterprise)
Quick Start
# 1. Install GPU CLI
curl -fsSL https://gpu-cli.sh/install.sh | sh
# 2. Run your code on a remote GPU
gpu run python train.py
Claude Code Plugin
This repo includes a Claude Code plugin that supercharges GPU CLI with AI assistance. Describe what you want in plain English, and Claude generates complete, runnable GPU workflows.
What's Included
Skills (Automatic AI Capabilities)
| Skill | Description |
|---|
| gpu-workflow-creator | Transform natural language into complete GPU projects |
| gpu-ml-trainer | LLM fine-tuning, LoRA training, classifier training |
| gpu-inference-server | Set up vLLM, TGI, or custom inference APIs |
| gpu-media-processor | Whisper transcription, voice cloning, video generation |
| gpu-cost-optimizer | GPU selection advice and cost optimization |
| gpu-debugger | Debug failed runs, OOM errors, connectivity issues |
Slash Commands
| Command | Description |
|---|
/gpu-cli:gpu-create | Create a complete project from a description |
/gpu-cli:gpu-optimize | Analyze and optimize your gpu.jsonc |
/gpu-cli:gpu-debug | Debug a failed GPU run |
/gpu-cli:gpu-quick | Quick-start common workflows |
Example Conversations
Create a LoRA training project:
You: I want to train a LoRA on photos of my dog so I can generate images of it
Claude: [Generates complete project with gpu.jsonc, train.py, requirements.txt, README.md]
Set up a private LLM API:
You: Set up Llama 3.1 70B as a private ChatGPT-like API
Claude: [Generates vLLM server config with OpenAI-compatible endpoints]
Debug an error:
You: /gpu-cli:gpu-debug CUDA out of memory when running FLUX
Claude: [Analyzes error, suggests reducing batch size or upgrading to A100]
Optimize costs:
You: /gpu-cli:gpu-optimize
Claude: [Reviews gpu.jsonc, suggests RTX 4090 instead of A100 for your workload, saving 75%]
Templates
Ready-to-use templates for common AI/ML workflows:
Common Commands
# Run a command on remote GPU
gpu run python script.py
# Run a server with port forwarding
gpu run -p 8188:8188 python server.py --listen 0.0.0.0
# Open a shell on the remote pod
gpu shell
# View running pods
gpu pods list
# Stop a pod
gpu stop
# Interactive dashboard
gpu dashboard
Team Management
# Create an organization
gpu org create "My Team"
# Switch to org context
gpu org switch my-team
# Invite a teammate
gpu org invite alice@example.com --role admin
# Create a CI/CD service account
gpu org service-account create --name "github-actions"
Configuration
Create a gpu.jsonc in your project:
{
"$schema": "https://gpu-cli.sh/schema/v1/gpu.json",
"project_id": "my-project",
"provider": "runpod",
// Sync outputs back to local machine
"outputs": ["output/", "models/"],
// GPU selection
"gpu_type": "RTX 4090",
"min_vram": 24,
// Optional: Pre-download models
"download": [
{ "strategy": "hf", "source": "black-forest-labs/FLUX.1-dev", "allow": "*.safetensors" }
],
"environment": {
"base_image": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04"
}
}
Network Volumes (Recommended)
For faster startup and persistent model storage, use RunPod Network Volumes. See the Network Volumes Guide for setup instructions.
GPU Options