Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By spark-arena
Run, monitor, benchmark, and manage LLM inference workloads on NVIDIA DGX Spark GPU clusters directly from your IDE. Set up clusters with SSH mesh and Docker configs, launch servers with vLLM/SGLang recipes, track utilization via live TUI, deploy LiteLLM proxies for OpenAI-compatible APIs, and stop jobs cleanly.
npx claudepluginhub spark-arena/sparkrun --plugin sparkrunRun benchmarks against an inference workload.
Browse and search available inference recipes.
Live-monitor CPU, RAM, and GPU metrics across cluster hosts.
Manage the LiteLLM-based inference proxy gateway.
Run an inference workload on DGX Spark using a sparkrun recipe.
Manage recipe registries and create inference recipes
ALWAYS invoke this skill before running any sparkrun CLI commands. Never run sparkrun directly via Bash without loading this skill first. Covers launching, monitoring, stopping, and checking status of inference workloads on NVIDIA DGX Spark.
Install sparkrun and configure DGX Spark clusters
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Claude Code skill pack for CoreWeave (24 skills)
Agent-ready playbooks for LLM serving benchmarks, capacity planning, torch-profiler triage, pipeline analysis, compute simulation, SGLang/vLLM SOTA Humanize loops, human code review, production incident triage, and model PR-history dossiers.
Deploy and benchmark vLLM with Claude Code
Agent Skills for NeMo Evaluator SDK
Machine learning training and inference pipeline using cloud GPUs (Modal, Lambda Labs, RunPod) with HuggingFace ecosystem - no local GPU required
SkyPilot agent skill for launching cloud VMs, Kubernetes pods, and Slurm jobs across 25+ clouds
Launch, manage, and stop LLM inference workloads on one or more NVIDIA DGX Spark systems — no Slurm, no Kubernetes, no fuss.
Documentation · Quick Start · Recipes · Spark Arena
uvx sparkrun setup
One command — installs sparkrun, then launches the guided setup wizard to create a cluster, configure SSH mesh, detect ConnectX-7 NICs, set up sudoers, and enable earlyoom.
# Run an inference workload
sparkrun run qwen3-1.7b-vllm
# Multi-node tensor parallelism (TP maps to node count on DGX Spark)
sparkrun run qwen3-1.7b-vllm --tp 2
# Re-attach to logs, stop a workload, check status
sparkrun logs qwen3-1.7b-vllm
sparkrun stop qwen3-1.7b-vllm
sparkrun status
Ctrl+C detaches from logs — it never kills your inference job. Your model keeps serving.
See the full CLI reference for all commands and options.
--tp 2 = 2 hosts, automatic InfiniBand/RDMA detectionsparkrun show <recipe>)Spark Arena is the community hub for DGX Spark recipe benchmarks — browse benchmark results, then run them directly with sparkrun.
Official Recipes are maintained by the Spark Arena team and hosted on GitHub. They are tested and optimized for NVIDIA DGX Spark systems.
Community Recipes are contributed by the community and hosted on GitHub.
Apache License 2.0 — see LICENSE for details.