Skill

lvsa-vllm-omni

Configures and runs the LVSA vllm-omni serving plugin for video diffusion models. Set per-model env vars (LVSA_WAN_HOOK, LVSA_REFERENCE_LATENT_FRAMES), debug silent Wan fallbacks, or override geometry for non-default resolutions.

ai-ml

npx claudepluginhub jiusiserve/longvideosparseattention --plugin lvsa-vllm-omni

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/lvsa-vllm-omni:lvsa-vllm-omni

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

`lvsa-vllm-omni` is a separate pip package that registers itself as a vllm-omni attention backend. Zero changes required in vllm-omni core — everything plugs in via the plugin entry-point system.

SKILL.md

164 lines · ~1.6k tokens

Similar Skills

lvsa-quickstart

Installs LVSA and generates long videos with block-sparse attention. Use when setting up LVSA from scratch, choosing SDPA vs FlashInfer backend, configuring reference latent frames per model, or verifying sparse path engagement.

lvsa-quickstart

lvsa-add-model

Adds LVSA support for a new video diffusion model by implementing the ModelAdapter ABC. Covers geometry, QKV extraction, RoPE, and output projection for single-stream, dual-stream, or joint-attention DiTs.

lvsa-add-model

lvsa-troubleshooting

Diagnoses LVSA failure modes: no speedup vs Dense, silent fallback, OOM at long sequences, missing mp4 in Docker, quality regression, and env var issues.

lvsa-troubleshooting

Stats

LanguagePython

Parent stars12

Parent forks3

MaintenanceGood

Last CommitJun 1, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

LVSA vllm-omni plugin

Overview

lvsa-vllm-omni is a separate pip package that registers itself as a vllm-omni attention backend. Zero changes required in vllm-omni core — everything plugs in via the plugin entry-point system.

Path: lvsa-vllm-omni/

Install

pip install -e .                  # LVSA core
pip install -e lvsa-vllm-omni/    # The plugin (registers entry-point)
pip install "vllm==0.18.0" "vllm-omni==0.18.0"   # validated pair (mismatched minor versions emit warnings)

After install, the plugin auto-loads when vllm-omni starts.

Enable for a model

HunyuanVideo 1.5

DIFFUSION_ATTENTION_BACKEND=LVSA \
LVSA_AUTO_KEYFRAMES=1 \
LVSA_REFERENCE_LATENT_FRAMES=33 \
LVSA_ROTATE_KEYFRAMES=1 \
vllm serve --omni --model HunyuanVideo-1.5-Diffusers-480p_t2v

HunyuanVideo's hook auto-activates when DIFFUSION_ATTENTION_BACKEND=LVSA.

Wan 2.x

DIFFUSION_ATTENTION_BACKEND=LVSA \
LVSA_WAN_HOOK=1 \
LVSA_AUTO_KEYFRAMES=1 \
LVSA_REFERENCE_LATENT_FRAMES=21 \
LVSA_ROTATE_KEYFRAMES=1 \
vllm serve --omni --model Wan2.2-T2V-14B

Wan requires LVSA_WAN_HOOK=1 explicitly. Without it, Wan's _sp_plan pre-shards the sequence and geometry detection fails silently.

Convenience wrapper

A shell wrapper that sets the right env vars per model family:

examples/vllm_omni_serve.sh wan      /path/to/Wan2.1-T2V-1.3B-Diffusers
examples/vllm_omni_serve.sh hunyuan  /path/to/HunyuanVideo-1.5-Diffusers-480p_t2v

# Override port / dtype
PORT=8200 DTYPE=float16 examples/vllm_omni_serve.sh wan ...

Environment variables

Core (always set)

Var	Default	Purpose
`DIFFUSION_ATTENTION_BACKEND`	(unset)	Set to `LVSA` to engage
`LVSA_REFERENCE_LATENT_FRAMES`	`21`	Per-model training horizon. CRITICAL. Wan=21, HV=33, Cog=13.
`LVSA_AUTO_KEYFRAMES`	`1`	Auto-derive keyframe interval from frame count
`LVSA_WAN_HOOK`	`0`	Required for Wan. Off for HunyuanVideo.
`LVSA_ROTATE_KEYFRAMES`	`0`	Shift keyframe grid each denoising step (recommended at extension)

Tuning

Var	Default	Purpose
`LVSA_SPARSITY_SCALE`	`1.0`	Multiplier on attention budget. See `lvsa-tuning` skill.
`LVSA_WINDOW_SIZE`	`12`	Local window half-width (video frames)
`LVSA_N_FIRST_FRAMES`	`4`	Leading global frames
`LVSA_KEY_FRAME_INTERVAL`	`16`	Manual keyframe interval (ignored if AUTO_KEYFRAMES)
`LVSA_EXPAND_WINDOW`	`1`	Extend local window when globals overlap

Geometry (for non-default resolutions)

Var	Default	Purpose
`LVSA_PATCHES_PER_FRAME`	`1560` (480×832)	Tokens per latent frame
`LVSA_VIDEO_HEIGHT`	(unset)	Use with VAE_SPATIAL_FACTOR + PATCH_SIZE for derivation
`LVSA_VIDEO_WIDTH`	(unset)	Same
`LVSA_VAE_SPATIAL_FACTOR`	`8`	VAE spatial compression
`LVSA_PATCH_SIZE`	`2`	Patchify factor
`LVSA_VAE_TEMPORAL_FACTOR`	`4`	VAE temporal compression

For 720p or larger, set LVSA_VIDEO_HEIGHT/WIDTH and the plugin will derive patches_per_frame automatically.

Diagnostics

Var	Default	Purpose
`LVSA_BACKEND`	`sdpa`	`sdpa` or `flashinfer`
`LVSA_TOTAL_LATENT_FRAMES`	(unset)	Override auto-detected T_lat (rarely needed)
`LVSA_MEM_LOG`	`0`	Per-step memory log (`[LVSA-MEM] step=N alloc=… reserved=… peak=…`)

Verifying engagement

After startup + first request, look for:

[LVSA] Step counter: n_blocks=N (from env)
[LVSA-hook] Installed LVSA hook on HunyuanVideo15Attention ...
[LVSA] Geometry detected: T_lat=33 P=1560 text_tokens=512
[LVSA] reference_latent_frames=33 target_latent_frames=33 extension_ratio=1.00x

If you see [LVSA-FALLBACK] origin=forward_cuda reason=geometry_detect ..., the plugin couldn't infer the geometry from the seq_len it received. Walk through:

Is seq_len = T_lat * P + enc_tokens for any P in candidate_patches_per_frame()?
Is LVSA_PATCHES_PER_FRAME set if you're at a non-default resolution?
For Wan: is LVSA_WAN_HOOK=1?

Multi-GPU

Standard Ulysses context parallel. Set --ulysses-degree N in the vllm-omni CLI; the plugin handles the global K/V gather automatically.

DIFFUSION_ATTENTION_BACKEND=LVSA \
LVSA_REFERENCE_LATENT_FRAMES=33 \
LVSA_AUTO_KEYFRAMES=1 \
vllm serve --omni --model HunyuanVideo-1.5 --ulysses-degree 2

Constraint: seq_len = T_lat × patches_per_frame must be divisible by N.

Plugin structure

lvsa-vllm-omni/
├── lvsa_vllm_omni/
│   ├── __init__.py              # Entry-point: register_lvsa_backend()
│   ├── backend.py               # LVSABackend (vllm attention backend)
│   ├── attention_impl.py        # LVSAAttentionImpl → sparse_windowed_attention()
│   ├── wan_hook.py              # Wan-specific monkey patch
│   ├── hunyuan_hook.py          # HunyuanVideo-specific monkey patch
│   ├── register.py              # Hook auto-install based on env vars
│   ├── config.py                # LVSAConfig dataclass (env var parsing)
│   ├── global_kv.py             # Global K/V gather helpers
│   ├── step_tracker.py          # Per-step state
│   └── _fallback.py             # Silent-fallback warnings
└── tests/                       # CPU-only integration tests (~50 tests)

Common issues

Symptom	Fix
No `[LVSA]` log lines	Check `DIFFUSION_ATTENTION_BACKEND=LVSA`; for Wan also `LVSA_WAN_HOOK=1`
`[LVSA-FALLBACK] reason=geometry_detect`	Set `LVSA_PATCHES_PER_FRAME` (or HEIGHT/WIDTH) for your resolution
Quality regression at 1×	`LVSA_REFERENCE_LATENT_FRAMES` wrong for the model
No speedup despite engagement	At T_lat ≤ ref, kfi=1 → fully dense. Lower `LVSA_SPARSITY_SCALE` to see real sparsity

See lvsa-troubleshooting for the full failure-mode catalog and docs/VLLM_OMNI_INTEGRATION.md for the architecture details.

lvsa-vllm-omni

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

lvsa-vllm-omni

Popularity

Invocation

Context Preview

SKILL.md

LVSA vllm-omni plugin

Overview

Install

Enable for a model

HunyuanVideo 1.5

Wan 2.x

Convenience wrapper

Environment variables

Core (always set)

Tuning

Geometry (for non-default resolutions)

Diagnostics

Verifying engagement

Multi-GPU

Plugin structure

Common issues

Similar Skills

Help us improve

LVSA vllm-omni plugin

Overview

Install

Enable for a model

HunyuanVideo 1.5

Wan 2.x

Convenience wrapper

Environment variables

Core (always set)

Tuning

Geometry (for non-default resolutions)

Diagnostics

Verifying engagement

Multi-GPU

Plugin structure

Common issues