Help us improve
Share bugs, ideas, or general feedback.
From lvsa-tuning
Adjusts LVSA sparsity, window geometry, and rotation settings to tune video generation quality vs speed, including handling quality regressions.
npx claudepluginhub jiusiserve/longvideosparseattention --plugin lvsa-tuningHow this skill is triggered — by the user, by Claude, or both
Slash command
/lvsa-tuning:lvsa-tuningThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
| Knob | What it controls | When to touch it |
Installs LVSA and generates long videos with block-sparse attention. Use when setting up LVSA from scratch, choosing SDPA vs FlashInfer backend, configuring reference latent frames per model, or verifying sparse path engagement.
Diagnoses LVSA failure modes: no speedup vs Dense, silent fallback, OOM at long sequences, missing mp4 in Docker, quality regression, and env var issues.
Reproduce LVSA paper headline numbers using bundled benchmarks scripts. Use for SotA comparison, latency scaling, scoring with VQeval and VBench-Long, and regenerating figures.
Share bugs, ideas, or general feedback.
| Knob | What it controls | When to touch it |
|---|---|---|
reference_latent_frames | Per-query attention budget anchor | Set once per model (Wan=21, HV=33, Cog=13). Don't change at runtime. |
sparsity_scale | Multiplier on the budget | The runtime quality/speed dial. Default 1.0; lower = sparser. |
window_size, n_first_frames | Local-window geometry | Usually leave at defaults (12 frames / 4 frames). Only touch if you want a tighter floor. |
LVSA_SPARSITY_SCALE (env var) or --sparsity-scale (CLI). Scales the auto-keyframe scheduler's per-query budget.
scaled_ref = max(n_first + 1, int(reference_frames × sparsity_scale))
target_attended = min(scaled_ref, T_lat)
Empirical results on HunyuanVideo at 129 frames (training reference, single-prompt "dog"):
sparsity_scale | Per-step | Speedup vs dense | VQeval composite | VQeval loop |
|---|---|---|---|---|
| (dense baseline) | 44.0 s | — | 57.6 | 32.6 |
0.5 (aggressive) | 18.3 s | 2.40× | 65.2 (+7.6) | 73.6 (+41.0) |
1.0 (default) | 22.5 s | 1.96× | 61.3 (+3.7) | 63.0 (+30.4) |
| Goal | sparsity_scale | Why |
|---|---|---|
| Match dense quality at training reference, take implementation speedup | 1.0 | At T_lat ≤ ref this collapses to kfi=1 (fully dense). Speedup comes from bypassing native attention overhead. |
| Maximum speedup at training reference | 0.5 | Engages pattern-driven sparsity even at T_lat=ref. Big loop-quality gains; ~5pt drop on dynamic_quality. |
| Aggressive extrapolation, OOM-prevention at 3×+ horizon | 0.5 | Shrinks compact-K buffer, helps fit on 80 GB. |
| Conservative quality at extrapolation | 0.75 | Reduces sparsity gradient; less speedup but keeps motion intact. |
sparsity_scale ≥ 1.0 collapses to kfi=1 (fully dense). The visible speedup is implementation efficiency only.sparsity_scale = 2.0 is equivalent to 1.0 at T_lat ≤ reference (both give kfi=1). The conservative knob is meaningful only at extrapolation lengths.sparsity_scale = 0.5 activates real pattern sparsity even at training reference: HV's budget shrinks from 33 to 16 latents at 1×, giving ~52% coverage.s=0.5 comes from --rotate-keyframes dithering the attention pattern each step. Disable rotation and the loop gain disappears.Defaults:
window_size = 12 video frames = 3 latent frames (W=3)n_first_frames = 4 video frames = 1 latent frame (n_first=1)Floor of attended frames per query: 2W+1 + n_first = 8 latent frames.
When to reduce W: never, unless your reference_latent_frames is below the floor. The defaults are tuned for current models.
When to increase W (e.g. W=4):
dynamic_quality at extension — bigger window = more long-range mixing inside each query's attended set.W += 1.| At length | Without rotation | With rotation |
|---|---|---|
| T_lat ≤ reference | No effect (kfi=1 means every frame is a global anyway) | No effect |
| Slight extension (T ≈ 1.5×) | Static keyframes can introduce period artifacts | Smoother |
| Heavy extension (T ≥ 3×) | Output starts to loop / freeze | Strongly preferred — this is the mechanism that prevents the "frozen video" failure mode |
Default --rotate-keyframes on whenever you're extending. Off at training horizon adds nothing.
RIFLEx rescales the RoPE frequencies to extrapolate beyond the training horizon. It's orthogonal to LVSA (RoPE-only, no attention compute change) and stacks cleanly:
python examples/wan_generate.py \
--model /path/to/Wan2.1-T2V-1.3B-Diffusers \
--prompt "..." \
--num-frames 321 \
--lvsa --flashinfer --rotate-keyframes --auto-keyframes \
--riflex --riflex-s 4.0
At extension lengths RIFLEx + LVSA-FI is the recommended recipe. On the SotA grid (Wan 1.3B, 5 prompts):
| Horizon | LVSA-FI alone | LVSA-FI + RIFLEx |
|---|---|---|
| 2× | 1.43× faster than Dense | ~same speed, slight quality bump |
| 4× | 2.41× faster than Dense | ~same speed, +1 VQeval |
RIFLEx adds zero measurable wall-time overhead (verified: 0.99–1.00× Dense).
After every run, the [LVSA] log line tells you exactly what the scheduler did:
[LVSA] kfi=6 global_count=14 attended_per_frame=21/81
kfi=6 — every 6th frame is a periodic global anchor (auto-derived)global_count=14 — total global frames in the pattern (n_first + periodic)attended_per_frame=21/81 — each query attends to 21 frames out of 81 → 74% sparsityFor non-default geometry, use the inline helper in docs/tuning.md to compute the budget yourself.
| Symptom | Likely root cause | Fix |
|---|---|---|
| No quality improvement vs Dense | sparsity_scale too high at training horizon | Drop to 0.5 |
| Motion quality regressed | Window too small for fast-motion prompt | Try --window-size 16 (W=4) |
| Video loops at extension | --rotate-keyframes not set | Add the flag |
attended_per_frame=N/T shows N==T at extension | reference_latent_frames too high | Verify per-model value |
See lvsa-troubleshooting for the full failure-mode catalog.