Skill

lvsa-troubleshooting

Diagnoses LVSA failure modes: no speedup vs Dense, silent fallback, OOM at long sequences, missing mp4 in Docker, quality regression, and env var issues.

Python

npx claudepluginhub jiusiserve/longvideosparseattention --plugin lvsa-troubleshooting

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/lvsa-troubleshooting:lvsa-troubleshooting

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

LVSA's failure modes are mostly **silent**: the run completes, an mp4 is produced, and only careful inspection reveals that the sparse path didn't engage or the geometry was wrong. This skill covers the seven most common issues.

SKILL.md

164 lines · ~1.7k tokens

Similar Skills

lvsa-quickstart

Installs LVSA and generates long videos with block-sparse attention. Use when setting up LVSA from scratch, choosing SDPA vs FlashInfer backend, configuring reference latent frames per model, or verifying sparse path engagement.

lvsa-quickstart

lvsa-tuning

Adjusts LVSA sparsity, window geometry, and rotation settings to tune video generation quality vs speed, including handling quality regressions.

lvsa-tuning

lvsa-reproduce-paper

Reproduce LVSA paper headline numbers using bundled benchmarks scripts. Use for SotA comparison, latency scaling, scoring with VQeval and VBench-Long, and regenerating figures.

lvsa-reproduce-paper

Stats

LanguagePython

Parent stars12

Parent forks3

MaintenanceGood

Last CommitJun 1, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

LVSA Troubleshooting

LVSA's failure modes are mostly silent: the run completes, an mp4 is produced, and only careful inspection reveals that the sparse path didn't engage or the geometry was wrong. This skill covers the seven most common issues.

Diagnostics one-liner

When something looks off, dump the LVSA-relevant log lines:

grep -E "\[(LVSA|LVSA)" run.log | head -20

If the output is empty or shows only fallback warnings, walk through Symptoms 1–4 below.

Symptom 1 — No speedup (LVSA matches Dense wall-clock)

Root cause: the LVSA backend silently fell back to dense at every block. Most often because geometry detection failed.

Diagnosis

[LVSA-FALLBACK] origin=forward_cuda reason=geometry_detect seq_len=25740 known_ppf=[1560]
[LVSA-FALLBACK] origin=forward_cuda reason=no_t_lat seq_len=…

If you see no [LVSA] lines at all, the backend wasn't selected — check DIFFUSION_ATTENTION_BACKEND=LVSA (vllm-omni) or --lvsa (standalone).

Fix

For vllm-omni:

DIFFUSION_ATTENTION_BACKEND=LVSA
LVSA_AUTO_KEYFRAMES=1
LVSA_REFERENCE_LATENT_FRAMES=<21|33|13>
For Wan: also LVSA_WAN_HOOK=1 (without it Wan's _sp_plan pre-shards the sequence)

For non-default resolutions (not 480×832), set the geometry env vars:

LVSA_PATCHES_PER_FRAME, LVSA_VIDEO_HEIGHT, LVSA_VIDEO_WIDTH, LVSA_VAE_SPATIAL_FACTOR, LVSA_PATCH_SIZE, LVSA_VAE_TEMPORAL_FACTOR

Symptom 2 — OOM at long sequences (T_lat ≥ 65 on 80 GB GPU)

Root cause: the VAE decode (not attention) blew up memory. LVSA reduces attention to ~50% of dense at 2× horizon, but the VAE then has to decode all those latent frames at once.

Diagnosis

The OOM trace points at vae.decode or unsqueeze inside the VAE. The attention forward completes cleanly.

Fix

Use --output-latent (standalone) or output_type="latent" (vllm-omni Python API) to skip VAE decode and save the denoised latent tensor to a .pt file. Decode it offline on a higher-memory GPU.

python examples/hunyuan_generate.py \
    --model /models/HunyuanVideo-1.5-Diffusers-480p_t2v \
    --num-frames 257 --output-dir benchmarks --output-name out_2x \
    --lvsa --flashinfer --rotate-keyframes --auto-keyframes \
    --output-latent

The mp4 path becomes a .pt path:

import torch
data = torch.load("out_2x.pt")   # {"latent": tensor, "num_frames": 257, ...}

Symptom 3 — Output mp4 missing after Docker run

Root cause: an absolute host path was passed to --output, the container interprets it relative to its own filesystem (not the bind-mount), so the mp4 lands inside the container and disappears with --rm.

Diagnosis

The run log says [export] /home/me/.../out.mp4 (1.4s) but ls /home/me/.../out.mp4 on the host returns nothing.

Fix

Use relative paths against the container's working directory + bind-mount:

# WRONG — path lost on --rm
docker run ... lvsa-vllm-omni:latest \
    python examples/hunyuan_generate.py --output-name /home/me/out.mp4

# RIGHT — relative path, lands on host
docker run ... -v $(pwd):/workdir/code -w /workdir/code ... \
    lvsa-vllm-omni:latest \
    python examples/hunyuan_generate.py --output-name benchmarks/results/out.mp4

Always append && chown -R $(id -u):$(id -g) <output_dir> so files written as root are readable on host.

Symptom 4 — Quality regression at training reference

Root cause: LVSA_REFERENCE_LATENT_FRAMES is wrong for your model. Auto-keyframe scheduler computes sparsity using the wrong budget, so even at training horizon you get partial attention coverage.

Diagnosis

[LVSA] reference_latent_frames=21 target_latent_frames=33 extension_ratio=1.57x

If you're running HunyuanVideo at 129 frames and reference_latent_frames=21 with extension_ratio > 1.0 (should be exactly 1.0), the scheduler is using Wan's default.

Fix

LVSA_REFERENCE_LATENT_FRAMES=33    # HunyuanVideo
LVSA_REFERENCE_LATENT_FRAMES=21    # Wan 2.x
LVSA_REFERENCE_LATENT_FRAMES=13    # CogVideoX

The single most common LVSA configuration mistake. Always set explicitly.

Symptom 5 — `dynamic_quality` drops 5+ points

Root cause: documented trade-off of any sparse-attention scheme. Long-range pairs that contribute to large-scale motion coherence are skipped. Motion-heavy prompts can lose ~5 points on VQeval dynamic_quality and ~0.02 on VBench motion_smoothness.

Diagnosis

Compare per-dimension VQeval between dense and LVSA on the same prompts. If only dynamic_quality drops while loop_quality and text_alignment improve, this is the trade-off.

Fix

sparsity_scale = 1.0 (default) — at T_lat ≤ ref, fully-dense via LVSA path; speedup without sparsity cost.
Increase window_size to 16 (W=4 latent) — more long-range mixing per query. Costs ~10% wall time.
Switch to dense for motion-heavy prompts.

Symptom 6 — `loop_quality` improves by +30 points (too good?)

Root cause: prompt-specific. LVSA's rotating-keyframe pattern dithers attention each step, preventing dense's looping/static failure mode. When dense already loops (loop_quality < 40), LVSA can lift by +30 to +40. Prompts where dense wasn't looping see ~+5.

Diagnosis

Look at dense's baseline loop_quality. If < 40 and LVSA's > 60, the gain is real but largely a function of dense's weakness.

Fix

Not a bug — this is the rotating-keyframe mechanism working. When reporting, include the dense baseline's loop_quality so reviewers see dense's behavior on that prompt.

Symptom 7 — Graduated schedule env vars (`LVSA_SCHEDULE_*`) don't do anything

Root cause: LVSA_SCHEDULE_START and LVSA_SCHEDULE_END are soft-deprecated in v1.0. The recommended runtime knob is LVSA_SPARSITY_SCALE. The schedule vars still exist for backwards compatibility (default 0) but produce no behavior at default settings.

Fix

Use sparsity_scale instead. See the lvsa-tuning skill.

Filing a bug

Capture:

Full stdout/stderr of the run
Output of grep -E "\[(LVSA|LVSA)" run.log
Output of nvidia-smi --query-gpu=memory.used,memory.total --format=csv (or npu-smi info on Ascend)
Generated mp4 / .pt file size
Exact docker run / python command + env vars
LVSA version: git rev-parse HEAD

Open at https://github.com/JiusiServe/LongVideoSparseAttention/issues.

lvsa-troubleshooting

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

lvsa-troubleshooting

Popularity

Invocation

Context Preview

SKILL.md

LVSA Troubleshooting

Diagnostics one-liner

Symptom 1 — No speedup (LVSA matches Dense wall-clock)

Diagnosis

Fix

Symptom 2 — OOM at long sequences (T_lat ≥ 65 on 80 GB GPU)

Diagnosis

Fix

Symptom 3 — Output mp4 missing after Docker run

Diagnosis

Fix

Symptom 4 — Quality regression at training reference

Diagnosis

Fix

Symptom 5 — dynamic_quality drops 5+ points

Diagnosis

Fix

Symptom 6 — loop_quality improves by +30 points (too good?)

Diagnosis

Fix

Symptom 7 — Graduated schedule env vars (LVSA_SCHEDULE_*) don't do anything

Fix

Filing a bug

Similar Skills

Help us improve

LVSA Troubleshooting

Diagnostics one-liner

Symptom 1 — No speedup (LVSA matches Dense wall-clock)

Diagnosis

Fix

Symptom 2 — OOM at long sequences (T_lat ≥ 65 on 80 GB GPU)

Diagnosis

Fix

Symptom 3 — Output mp4 missing after Docker run

Diagnosis

Fix

Symptom 4 — Quality regression at training reference

Diagnosis

Fix

Symptom 5 — dynamic_quality drops 5+ points

Diagnosis

Fix

Symptom 6 — loop_quality improves by +30 points (too good?)

Diagnosis

Fix

Symptom 7 — Graduated schedule env vars (LVSA_SCHEDULE_*) don't do anything

Fix

Filing a bug

Symptom 5 — `dynamic_quality` drops 5+ points

Symptom 6 — `loop_quality` improves by +30 points (too good?)

Symptom 7 — Graduated schedule env vars (`LVSA_SCHEDULE_*`) don't do anything

Symptom 5 — `dynamic_quality` drops 5+ points

Symptom 6 — `loop_quality` improves by +30 points (too good?)

Symptom 7 — Graduated schedule env vars (`LVSA_SCHEDULE_*`) don't do anything