Help us improve
Share bugs, ideas, or general feedback.
From lvsa-troubleshooting
Diagnoses LVSA failure modes: no speedup vs Dense, silent fallback, OOM at long sequences, missing mp4 in Docker, quality regression, and env var issues.
npx claudepluginhub jiusiserve/longvideosparseattention --plugin lvsa-troubleshootingHow this skill is triggered — by the user, by Claude, or both
Slash command
/lvsa-troubleshooting:lvsa-troubleshootingThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
LVSA's failure modes are mostly **silent**: the run completes, an mp4 is produced, and only careful inspection reveals that the sparse path didn't engage or the geometry was wrong. This skill covers the seven most common issues.
Installs LVSA and generates long videos with block-sparse attention. Use when setting up LVSA from scratch, choosing SDPA vs FlashInfer backend, configuring reference latent frames per model, or verifying sparse path engagement.
Adjusts LVSA sparsity, window geometry, and rotation settings to tune video generation quality vs speed, including handling quality regressions.
Reproduce LVSA paper headline numbers using bundled benchmarks scripts. Use for SotA comparison, latency scaling, scoring with VQeval and VBench-Long, and regenerating figures.
Share bugs, ideas, or general feedback.
LVSA's failure modes are mostly silent: the run completes, an mp4 is produced, and only careful inspection reveals that the sparse path didn't engage or the geometry was wrong. This skill covers the seven most common issues.
When something looks off, dump the LVSA-relevant log lines:
grep -E "\[(LVSA|LVSA)" run.log | head -20
If the output is empty or shows only fallback warnings, walk through Symptoms 1–4 below.
Root cause: the LVSA backend silently fell back to dense at every block. Most often because geometry detection failed.
[LVSA-FALLBACK] origin=forward_cuda reason=geometry_detect seq_len=25740 known_ppf=[1560]
[LVSA-FALLBACK] origin=forward_cuda reason=no_t_lat seq_len=…
If you see no [LVSA] lines at all, the backend wasn't selected — check DIFFUSION_ATTENTION_BACKEND=LVSA (vllm-omni) or --lvsa (standalone).
For vllm-omni:
DIFFUSION_ATTENTION_BACKEND=LVSALVSA_AUTO_KEYFRAMES=1LVSA_REFERENCE_LATENT_FRAMES=<21|33|13>LVSA_WAN_HOOK=1 (without it Wan's _sp_plan pre-shards the sequence)For non-default resolutions (not 480×832), set the geometry env vars:
LVSA_PATCHES_PER_FRAME, LVSA_VIDEO_HEIGHT, LVSA_VIDEO_WIDTH, LVSA_VAE_SPATIAL_FACTOR, LVSA_PATCH_SIZE, LVSA_VAE_TEMPORAL_FACTORRoot cause: the VAE decode (not attention) blew up memory. LVSA reduces attention to ~50% of dense at 2× horizon, but the VAE then has to decode all those latent frames at once.
The OOM trace points at vae.decode or unsqueeze inside the VAE. The attention forward completes cleanly.
Use --output-latent (standalone) or output_type="latent" (vllm-omni Python API) to skip VAE decode and save the denoised latent tensor to a .pt file. Decode it offline on a higher-memory GPU.
python examples/hunyuan_generate.py \
--model /models/HunyuanVideo-1.5-Diffusers-480p_t2v \
--num-frames 257 --output-dir benchmarks --output-name out_2x \
--lvsa --flashinfer --rotate-keyframes --auto-keyframes \
--output-latent
The mp4 path becomes a .pt path:
import torch
data = torch.load("out_2x.pt") # {"latent": tensor, "num_frames": 257, ...}
Root cause: an absolute host path was passed to --output, the container interprets it relative to its own filesystem (not the bind-mount), so the mp4 lands inside the container and disappears with --rm.
The run log says [export] /home/me/.../out.mp4 (1.4s) but ls /home/me/.../out.mp4 on the host returns nothing.
Use relative paths against the container's working directory + bind-mount:
# WRONG — path lost on --rm
docker run ... lvsa-vllm-omni:latest \
python examples/hunyuan_generate.py --output-name /home/me/out.mp4
# RIGHT — relative path, lands on host
docker run ... -v $(pwd):/workdir/code -w /workdir/code ... \
lvsa-vllm-omni:latest \
python examples/hunyuan_generate.py --output-name benchmarks/results/out.mp4
Always append && chown -R $(id -u):$(id -g) <output_dir> so files written as root are readable on host.
Root cause: LVSA_REFERENCE_LATENT_FRAMES is wrong for your model. Auto-keyframe scheduler computes sparsity using the wrong budget, so even at training horizon you get partial attention coverage.
[LVSA] reference_latent_frames=21 target_latent_frames=33 extension_ratio=1.57x
If you're running HunyuanVideo at 129 frames and reference_latent_frames=21 with extension_ratio > 1.0 (should be exactly 1.0), the scheduler is using Wan's default.
LVSA_REFERENCE_LATENT_FRAMES=33 # HunyuanVideo
LVSA_REFERENCE_LATENT_FRAMES=21 # Wan 2.x
LVSA_REFERENCE_LATENT_FRAMES=13 # CogVideoX
The single most common LVSA configuration mistake. Always set explicitly.
dynamic_quality drops 5+ pointsRoot cause: documented trade-off of any sparse-attention scheme. Long-range pairs that contribute to large-scale motion coherence are skipped. Motion-heavy prompts can lose ~5 points on VQeval dynamic_quality and ~0.02 on VBench motion_smoothness.
Compare per-dimension VQeval between dense and LVSA on the same prompts. If only dynamic_quality drops while loop_quality and text_alignment improve, this is the trade-off.
sparsity_scale = 1.0 (default) — at T_lat ≤ ref, fully-dense via LVSA path; speedup without sparsity cost.window_size to 16 (W=4 latent) — more long-range mixing per query. Costs ~10% wall time.loop_quality improves by +30 points (too good?)Root cause: prompt-specific. LVSA's rotating-keyframe pattern dithers attention each step, preventing dense's looping/static failure mode. When dense already loops (loop_quality < 40), LVSA can lift by +30 to +40. Prompts where dense wasn't looping see ~+5.
Look at dense's baseline loop_quality. If < 40 and LVSA's > 60, the gain is real but largely a function of dense's weakness.
Not a bug — this is the rotating-keyframe mechanism working. When reporting, include the dense baseline's loop_quality so reviewers see dense's behavior on that prompt.
LVSA_SCHEDULE_*) don't do anythingRoot cause: LVSA_SCHEDULE_START and LVSA_SCHEDULE_END are soft-deprecated in v1.0. The recommended runtime knob is LVSA_SPARSITY_SCALE. The schedule vars still exist for backwards compatibility (default 0) but produce no behavior at default settings.
Use sparsity_scale instead. See the lvsa-tuning skill.
Capture:
grep -E "\[(LVSA|LVSA)" run.lognvidia-smi --query-gpu=memory.used,memory.total --format=csv (or npu-smi info on Ascend)docker run / python command + env varsgit rev-parse HEADOpen at https://github.com/JiusiServe/LongVideoSparseAttention/issues.