Skill

lvsa-tuning

Adjusts LVSA sparsity, window geometry, and rotation settings to tune video generation quality vs speed, including handling quality regressions.

performance

npx claudepluginhub jiusiserve/longvideosparseattention --plugin lvsa-tuning

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/lvsa-tuning:lvsa-tuning

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

| Knob | What it controls | When to touch it |

SKILL.md

119 lines · ~1.5k tokens

Similar Skills

lvsa-quickstart

Installs LVSA and generates long videos with block-sparse attention. Use when setting up LVSA from scratch, choosing SDPA vs FlashInfer backend, configuring reference latent frames per model, or verifying sparse path engagement.

lvsa-quickstart

lvsa-troubleshooting

Diagnoses LVSA failure modes: no speedup vs Dense, silent fallback, OOM at long sequences, missing mp4 in Docker, quality regression, and env var issues.

lvsa-troubleshooting

lvsa-reproduce-paper

Reproduce LVSA paper headline numbers using bundled benchmarks scripts. Use for SotA comparison, latency scaling, scoring with VQeval and VBench-Long, and regenerating figures.

lvsa-reproduce-paper

Stats

LanguagePython

Parent stars12

Parent forks3

MaintenanceGood

Last CommitJun 1, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

LVSA Tuning

The three primary knobs

Knob	What it controls	When to touch it
`reference_latent_frames`	Per-query attention budget anchor	Set once per model (Wan=21, HV=33, Cog=13). Don't change at runtime.
`sparsity_scale`	Multiplier on the budget	The runtime quality/speed dial. Default `1.0`; lower = sparser.
`window_size`, `n_first_frames`	Local-window geometry	Usually leave at defaults (12 frames / 4 frames). Only touch if you want a tighter floor.

sparsity_scale — the headline dial

LVSA_SPARSITY_SCALE (env var) or --sparsity-scale (CLI). Scales the auto-keyframe scheduler's per-query budget.

scaled_ref     = max(n_first + 1, int(reference_frames × sparsity_scale))
target_attended = min(scaled_ref, T_lat)

Empirical results on HunyuanVideo at 129 frames (training reference, single-prompt "dog"):

`sparsity_scale`	Per-step	Speedup vs dense	VQeval composite	VQeval loop
(dense baseline)	44.0 s	—	57.6	32.6
`0.5` (aggressive)	18.3 s	2.40×	65.2 (+7.6)	73.6 (+41.0)
`1.0` (default)	22.5 s	1.96×	61.3 (+3.7)	63.0 (+30.4)

Rule of thumb

Goal	`sparsity_scale`	Why
Match dense quality at training reference, take implementation speedup	`1.0`	At T_lat ≤ ref this collapses to kfi=1 (fully dense). Speedup comes from bypassing native attention overhead.
Maximum speedup at training reference	`0.5`	Engages pattern-driven sparsity even at T_lat=ref. Big loop-quality gains; ~5pt drop on `dynamic_quality`.
Aggressive extrapolation, OOM-prevention at 3×+ horizon	`0.5`	Shrinks compact-K buffer, helps fit on 80 GB.
Conservative quality at extrapolation	`0.75`	Reduces sparsity gradient; less speedup but keeps motion intact.

Important nuances

At T_lat ≤ reference, any sparsity_scale ≥ 1.0 collapses to kfi=1 (fully dense). The visible speedup is implementation efficiency only.
sparsity_scale = 2.0 is equivalent to 1.0 at T_lat ≤ reference (both give kfi=1). The conservative knob is meaningful only at extrapolation lengths.
sparsity_scale = 0.5 activates real pattern sparsity even at training reference: HV's budget shrinks from 33 to 16 latents at 1×, giving ~52% coverage.
The large loop_quality gain at s=0.5 comes from --rotate-keyframes dithering the attention pattern each step. Disable rotation and the loop gain disappears.

window_size + n_first_frames

Defaults:

window_size = 12 video frames = 3 latent frames (W=3)
n_first_frames = 4 video frames = 1 latent frame (n_first=1)

Floor of attended frames per query: 2W+1 + n_first = 8 latent frames.

When to reduce W: never, unless your reference_latent_frames is below the floor. The defaults are tuned for current models.

When to increase W (e.g. W=4):

Motion-heavy prompts losing dynamic_quality at extension — bigger window = more long-range mixing inside each query's attended set.
Costs ~10% wall time per W += 1.

Should I use --rotate-keyframes?

At length	Without rotation	With rotation
T_lat ≤ reference	No effect (kfi=1 means every frame is a global anyway)	No effect
Slight extension (T ≈ 1.5×)	Static keyframes can introduce period artifacts	Smoother
Heavy extension (T ≥ 3×)	Output starts to loop / freeze	Strongly preferred — this is the mechanism that prevents the "frozen video" failure mode

Default --rotate-keyframes on whenever you're extending. Off at training horizon adds nothing.

Composing with RIFLEx

RIFLEx rescales the RoPE frequencies to extrapolate beyond the training horizon. It's orthogonal to LVSA (RoPE-only, no attention compute change) and stacks cleanly:

python examples/wan_generate.py \
    --model /path/to/Wan2.1-T2V-1.3B-Diffusers \
    --prompt "..." \
    --num-frames 321 \
    --lvsa --flashinfer --rotate-keyframes --auto-keyframes \
    --riflex --riflex-s 4.0

At extension lengths RIFLEx + LVSA-FI is the recommended recipe. On the SotA grid (Wan 1.3B, 5 prompts):

Horizon	LVSA-FI alone	LVSA-FI + RIFLEx
2×	1.43× faster than Dense	~same speed, slight quality bump
4×	2.41× faster than Dense	~same speed, +1 VQeval

RIFLEx adds zero measurable wall-time overhead (verified: 0.99–1.00× Dense).

Verifying engagement

After every run, the [LVSA] log line tells you exactly what the scheduler did:

[LVSA] kfi=6 global_count=14 attended_per_frame=21/81

kfi=6 — every 6th frame is a periodic global anchor (auto-derived)
global_count=14 — total global frames in the pattern (n_first + periodic)
attended_per_frame=21/81 — each query attends to 21 frames out of 81 → 74% sparsity

For non-default geometry, use the inline helper in docs/tuning.md to compute the budget yourself.

Diagnostics

Symptom	Likely root cause	Fix
No quality improvement vs Dense	`sparsity_scale` too high at training horizon	Drop to `0.5`
Motion quality regressed	Window too small for fast-motion prompt	Try `--window-size 16` (W=4)
Video loops at extension	`--rotate-keyframes` not set	Add the flag
`attended_per_frame=N/T` shows N==T at extension	`reference_latent_frames` too high	Verify per-model value

See lvsa-troubleshooting for the full failure-mode catalog.

lvsa-tuning

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

lvsa-tuning

Popularity

Invocation

Context Preview

SKILL.md

LVSA Tuning

The three primary knobs

sparsity_scale — the headline dial

Rule of thumb

Important nuances

window_size + n_first_frames

Should I use --rotate-keyframes?

Composing with RIFLEx

Verifying engagement

Diagnostics

Similar Skills

Help us improve

LVSA Tuning

The three primary knobs

sparsity_scale — the headline dial

Rule of thumb

Important nuances

window_size + n_first_frames

Should I use --rotate-keyframes?

Composing with RIFLEx

Verifying engagement

Diagnostics