Skill

run

Launches, monitors, stops, and manages LLM inference workloads on NVIDIA DGX Spark using sparkrun CLI. Covers status checks, logs, VRAM preview, benchmarks, kernel tuning, proxy management.

Bash

Docker

ai-ml

devops

npx claudepluginhub spark-arena/sparkrun --plugin sparkrun

Popularity

Parent stars

236

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/sparkrun:run

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

SKILL.md

289 lines · ~2.6k tokens

Similar Skills

registry

236

Manages sparkrun recipe registries by adding/removing/updating/disabling them, browses/searches/validates/shows inference recipes, and lists benchmark profiles for NVIDIA DGX Spark workloads.

sparkrun

skypilot

10.0k

Launches GPU/TPU clusters, training jobs, and inference servers across 25+ clouds, Kubernetes, Slurm using SkyPilot; debugs YAML, optimizes costs.

7 files

skypilot

hugging-face-jobs

40.4k

Runs Python workloads on Hugging Face managed infrastructure (CPUs, GPUs, TPUs) with Hub persistence. For batch inference, data processing, experiments, or any job without local GPU setup.

8 files

antigravity-awesome-skills

Stats

LanguagePython

Parent stars236

Parent forks22

MaintenanceGood

Last CommitMar 23, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Provides complete reference for launching, monitoring, and stopping LLM inference workloads using sparkrun on NVIDIA DGX Spark systems. Covers the full lifecycle: browse recipes, check VRAM fit, launch jobs, view logs, check status, stop workloads, run benchmarks, tune kernels, and manage the inference proxy.

<Use_When>

User wants to run an LLM inference model on DGX Spark
User wants to check status of running workloads
User wants to stop a running inference job
User wants to view logs from a running workload
User wants to preview VRAM requirements before launching
User wants to benchmark an inference workload
User wants to tune MoE kernels for better performance
User wants to manage the inference proxy
User wants to monitor cluster metrics
User asks "how do I run", "start", "launch", "deploy" a model </Use_When>

<Do_Not_Use_When>

User wants to install sparkrun or set up a cluster -- use the setup skill instead
User wants to manage recipe registries or create custom recipes -- use the registry skill instead
User is asking about sparkrun internals or development </Do_Not_Use_When>

Run a Recipe

# Single host
sparkrun run <recipe> --tp 1 --no-follow
sparkrun run <recipe> --hosts <ip> --no-follow

# Multi-node cluster
sparkrun run <recipe> --cluster <name> --no-follow
sparkrun run <recipe> --hosts <ip1>,<ip2>,... --no-follow
sparkrun run <recipe> --tp <N> --no-follow

# With overrides
sparkrun run <recipe> --port 9000 --gpu-mem 0.8 --no-follow
sparkrun run <recipe> -o max_model_len=8192 -o attention_backend=triton --no-follow
sparkrun run <recipe> --served-model-name my-model --no-follow
sparkrun run <recipe> --pp 2 --tp 2 --no-follow   # pipeline + tensor parallelism
sparkrun run <recipe> --max-model-len 32768 --no-follow

# Idempotent launch (exit 0 if already running)
sparkrun run <recipe> --ensure --no-follow

# With Docker options
sparkrun run <recipe> --restart unless-stopped --no-follow
sparkrun run <recipe> --no-rm --no-follow   # keep containers after stop
sparkrun run <recipe> --transfer-mode push --no-follow  # force push-based distribution

# Dry-run (show what would happen)
sparkrun run <recipe> --dry-run

CRITICAL: Always use --no-follow when running from an agent/skill context to avoid blocking on log streaming. Then use sparkrun cluster status or sparkrun logs separately to check on the job.

Check Status

# Show all sparkrun containers across cluster hosts
# NOTE: `sparkrun status` is an alias for `sparkrun cluster status` — only run one.
sparkrun cluster status
sparkrun cluster status --cluster <name>
sparkrun cluster status --hosts <ip1>,<ip2>,...

# Output shows:
#   - Grouped containers by job (with recipe name if cached)
#   - Container role, host, status, and image
#   - Ready-to-use logs and stop commands for each job
#   - Pending operations (downloads/distributions in progress)
#   - Idle hosts

# Check if a specific job is running (scripting-friendly)
sparkrun cluster check-job <recipe> --cluster <name>
sparkrun cluster check-job <cluster_id> --hosts <ip1>,<ip2>
sparkrun cluster check-job <recipe> --check-http-models   # also verify /v1/models endpoint
sparkrun cluster check-job <recipe> --json                 # JSON output
# Exit code: 0 = running, 1 = not running

Monitor Cluster Metrics

# Interactive TUI (requires interactive terminal)
sparkrun cluster monitor --cluster <name>

# Plain-text output (agent-friendly)
sparkrun cluster monitor --cluster <name> --simple

# JSON streaming (automation-friendly)
sparkrun cluster monitor --cluster <name> --json

# Custom interval
sparkrun cluster monitor --cluster <name> --interval 5

View Logs

# By recipe name
sparkrun logs <recipe> --cluster <name>
sparkrun logs <recipe> --hosts <ip1>,<ip2>,...
sparkrun logs <recipe> --tp <N>

# By cluster ID (from sparkrun status output)
sparkrun logs <cluster_id>
sparkrun logs <cluster_id> --cluster <name>

# Control log tail length
sparkrun logs <recipe> --tail 200

Stop a Workload

# By recipe name
sparkrun stop <recipe> --cluster <name>
sparkrun stop <recipe> --hosts <ip1>,<ip2>,...
sparkrun stop <recipe> --tp <N>

# By cluster ID (from sparkrun status output)
sparkrun stop <cluster_id>
sparkrun stop <cluster_id> --cluster <name>

# Stop all sparkrun containers (no recipe needed)
sparkrun stop --all --cluster <name>
sparkrun stop --all --hosts <ip1>,<ip2>,...

# Dry-run
sparkrun stop <recipe> --dry-run

Browse and Inspect Recipes

# List all available recipes (no filter)
sparkrun list

# List with filters
sparkrun list --all                         # include hidden registry recipes
sparkrun list --registry <name>             # filter by registry
sparkrun list --runtime vllm                # filter by runtime
sparkrun list <query>                       # filter by name

# Search for recipes by name, model, runtime, or description (contains-match)
sparkrun recipe search <query>
sparkrun recipe search <query> --registry <name> --runtime sglang

# Inspect a specific known recipe (by exact name or file path)
sparkrun recipe show <recipe>
sparkrun recipe show <recipe> --tp <N>

# Validate or estimate VRAM
sparkrun recipe validate <recipe>
sparkrun recipe vram <recipe> --tp <N> --max-model-len 32768

# Export a recipe
sparkrun recipe export <recipe>
sparkrun recipe export <recipe> --json
sparkrun recipe export <recipe> --save out.yaml

Use sparkrun recipe search as the first attempt when looking for a particular recipe. Use sparkrun recipe show when given a specific recipe name or file -- it may not appear in search results.

Benchmark

# Full flow: launch inference -> benchmark -> stop
sparkrun benchmark <recipe> --tp 1
sparkrun benchmark <recipe> --cluster <name>
sparkrun benchmark <recipe> --tp 2 --profile <profile_name>

# Benchmark an already-running instance (skip launch)
sparkrun benchmark <recipe> --skip-run --tp 1

# Keep inference running after benchmark completes
sparkrun benchmark <recipe> --no-stop --tp 1

# Override benchmark args (use -b for benchmark args, -o for recipe overrides)
sparkrun benchmark <recipe> -b depth=0,2048,4096 -b tg=32,128

# Specify framework and timeout
sparkrun benchmark <recipe> --framework llama-benchy --timeout 3600

# Dry-run
sparkrun benchmark <recipe> --dry-run

Kernel Tuning

# Tune SGLang fused MoE kernels
sparkrun tune sglang <recipe> --hosts <ip>
sparkrun tune sglang <recipe> --cluster <name> --tp 1 --tp 2 --tp 4
sparkrun tune sglang <recipe> -H <ip> --parallel 2

# Tune vLLM fused MoE kernels
sparkrun tune vllm <recipe> --hosts <ip>
sparkrun tune vllm <recipe> --cluster <name> --tp 4

Inference Proxy

# Start the unified OpenAI-compatible proxy
sparkrun proxy start --cluster <name>
sparkrun proxy start --port 4000

# Check proxy status and registered models
sparkrun proxy status
sparkrun proxy models
sparkrun proxy models --refresh

# Load/unload models through the proxy
sparkrun proxy load <recipe> --cluster <name>
sparkrun proxy unload <recipe> --cluster <name>

# Manage model aliases
sparkrun proxy alias add my-model "Qwen/Qwen3-1.7B"
sparkrun proxy alias remove my-model
sparkrun proxy alias list

# Stop the proxy
sparkrun proxy stop

<Tool_Usage> All sparkrun commands are executed via the Bash tool. No MCP tools are required.

When running workloads:

Always use --no-follow flag with sparkrun run
After launching, run sparkrun cluster status to confirm containers are running
Use the logs/stop commands from status output to manage jobs
For monitoring, use --simple or --json mode (TUI requires interactive terminal) </Tool_Usage>

<Key_Options>

sparkrun run options:

Option	Description
`--hosts, -H`	Comma-separated host list
`--hosts-file`	File with hosts (one per line)
`--cluster`	Use a saved cluster
`--tp, --tensor-parallel`	Override tensor parallelism (= node count)
`--pp, --pipeline-parallel`	Override pipeline parallelism
`--port`	Override serve port
`--gpu-mem`	GPU memory utilization (0.0-1.0)
`--max-model-len`	Override maximum model context length
`--served-model-name`	Override the served model name
`--image`	Override container image
`-o KEY=VALUE`	Override any recipe default
`--ensure`	Only launch if not already running; exit 0 if already up
`--no-rm`	Don't auto-remove containers on exit
`--restart POLICY`	Docker restart policy (no, always, unless-stopped, on-failure[:N])
`--transfer-mode`	Resource transfer mode (auto, local, push, delegated)
`--dry-run, -n`	Show what would be done
`--no-follow`	Don't attach to logs after launch
`--foreground`	Run in foreground (blocking)

</Key_Options>

<Important_Notes>

Always use --no-follow when running from an automated/agent context to avoid blocking
Use sparkrun cluster status after launching to confirm containers are running
--tp N must match the number of hosts (DGX Spark = 1 GPU per host)
sparkrun stop and sparkrun logs accept both recipe names and cluster IDs as targets
If --tp, --port, or --served-model-name were used during run, pass the same values to stop and logs
Use sparkrun show <recipe> --tp N to preview VRAM estimates before running
Container names follow the pattern sparkrun_{hash}_{role} where the hash is derived from runtime + model + sorted hosts + overrides
Ctrl+C while following logs detaches safely -- it never kills the inference job
Use sparkrun stop --all to stop all sparkrun containers without specifying a recipe
--solo is deprecated; use --tp 1 instead
Recipe names support @registry/name syntax for explicit registry selection
sparkrun update upgrades sparkrun itself (if installed via uv) and updates all registries </Important_Notes>

Task: {{ARGUMENTS}}

run

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

run

Popularity

Invocation

Context Preview

SKILL.md

Run a Recipe

Check Status

Monitor Cluster Metrics

View Logs

Stop a Workload

Browse and Inspect Recipes

Benchmark

Kernel Tuning

Inference Proxy

Similar Skills

Help us improve

Run a Recipe

Check Status

Monitor Cluster Metrics

View Logs

Stop a Workload

Browse and Inspect Recipes

Benchmark

Kernel Tuning

Inference Proxy