Help us improve
Share bugs, ideas, or general feedback.
From lvsa-reproduce-paper
Reproduce LVSA paper headline numbers using bundled benchmarks scripts. Use for SotA comparison, latency scaling, scoring with VQeval and VBench-Long, and regenerating figures.
npx claudepluginhub jiusiserve/longvideosparseattention --plugin lvsa-reproduce-paperHow this skill is triggered — by the user, by Claude, or both
Slash command
/lvsa-reproduce-paper:lvsa-reproduce-paperThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
```bash
Installs LVSA and generates long videos with block-sparse attention. Use when setting up LVSA from scratch, choosing SDPA vs FlashInfer backend, configuring reference latent frames per model, or verifying sparse path engagement.
Diagnoses LVSA failure modes: no speedup vs Dense, silent fallback, OOM at long sequences, missing mp4 in Docker, quality regression, and env var issues.
Adjusts LVSA sparsity, window geometry, and rotation settings to tune video generation quality vs speed, including handling quality regressions.
Share bugs, ideas, or general feedback.
git clone https://github.com/JiusiServe/LongVideoSparseAttention
cd LVSA
uv venv --python 3.12
source .venv/bin/activate
# Install LVSA + scoring deps
uv pip install -e ".[diffusers,hunyuan,flashinfer,dev]"
uv pip install -e vqeval/
# For VBench-Long, you need a separate venv (it pins old diffusers/transformers)
git clone https://github.com/Vchitect/VBench /path/to/VBench
python3 -m venv /path/to/vbench-venv
source /path/to/vbench-venv/bin/activate
pip install -e /path/to/VBench
deactivate
source .venv/bin/activate # back to LVSA venv
# Model weights (downloaded separately)
# Wan 2.1 1.3B: huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B-Diffusers
export MODEL_PATH=/path/to/Wan2.1-T2V-1.3B-Diffusers
Generate 60 videos: Dense / RIFLEx / LVSA-SDPA / LVSA-FI × 165/249/333 frames × 5 prompts.
export OUTDIR=out/sota_comparison
bash benchmarks/sota_comparison.sh
Expected wall time per cell (single A100, 50 steps, seed 16):
| Method | 2× (165f) | 3× (249f) | 4× (333f) |
|---|---|---|---|
| Dense | 566 s | 1145 s | 1930 s |
| RIFLEx | 564 s | 1149 s | 1931 s |
| LVSA (SDPA) | 502 s | 796 s | 1021 s |
| LVSA-FI | 395 s | 621 s | 802 s |
Total: ~9 hours on single A100, ~70 min on 8×A100 via GNU parallel.
UltraViCo is excluded — it lives in a separate repo (thu-ml/DiT-Extrapolation, branch ultra-wan) with a different CLI. See the paper appendix for the UltraViCo recipe.
For the headline 3.14× claim and the latency figure on the README:
export OUTDIR=out/latency_scaling
bash benchmarks/latency_scaling.sh
Frame counts swept: 81 (1×) / 161 (2×) / 321 (4×) / 481 (6×). Methods: Dense + LVSA-FI.
bash benchmarks/score_vqeval.sh out/sota_comparison
# Writes <stem>.vqeval.json next to each mp4
VQeval scores 6 dimensions + composite. Single A100 + the bundled vqeval/ subpackage. Expect ~15 min for 60 videos.
VBENCH_REPO=/path/to/VBench \
VBENCH_PYTHON=/path/to/vbench-venv/bin/python \
bash benchmarks/score_vbench.sh out/sota_comparison
# Writes <stem>.vbench.json next to each mp4
VBench-Long scores 5 dimensions: subject_consistency, temporal_flickering, motion_smoothness, background_consistency, imaging_quality.
python benchmarks/aggregate.py --outdir out/sota_comparison
# Writes _summary.csv (60 rows) and _summary_means.csv (12 cells)
The aggregator walks the output directory, parses tags (<model>__<backend>__<horizon>__<prompt>), loads the per-video JSONs, and emits tidy + means CSVs.
python benchmarks/generate_figures.py \
--sota-csv out/sota_comparison/_summary_means.csv \
--scaling-csv out/latency_scaling/_summary_means.csv \
--outdir docs/figures/
Produces 4 PNGs at 300 DPI:
latency_scaling.png — Wan 1.3B Dense vs LVSA wall-time scalingcrossmodel_speedup.png — speedup-vs-Dense bar charthv_latency_scaling.png — HunyuanVideo wall-time scalingsparsity_vs_frames.png — per-query attended fraction by model| Horizon | LVSA (SDPA) | LVSA-FI | LVSA-FI vs Dense |
|---|---|---|---|
| 2× | 502 s | 395 s | 1.43× |
| 3× | 796 s | 621 s | 1.84× |
| 4× | 1021 s | 802 s | 2.41× |
| Horizon | LVSA-FI vs UltraViCo |
|---|---|
| 2× | 1.88× |
| 3× | 2.49× |
| 4× | 3.27× |
| Horizon | LVSA-FI Δ |
|---|---|
| 2× | +6.5 |
| 3× | +11.2 |
| 4× | +9.9 |
| Horizon | LVSA-FI Δ |
|---|---|
| 2× | +0.09 |
| 3× | +0.04 |
| 4× | +0.10 |
.mp4 (or .vqeval.json, .vbench.json) already exists. Crash-and-resume works.SEED=<n> to get a different RNG roll.LVSA_PATCHES_PER_FRAME set or VIDEO_HEIGHT/WIDTH for the vllm-omni plugin.scripts/paper_results/sota_job_runner.sh ships a flock-queued GNU-parallel orchestrator. The pruned recipe in benchmarks/ is single-GPU sequential.Change the example invocation in benchmarks/sota_comparison.sh from examples/wan_generate.py to examples/hunyuan_generate.py (and the HORIZONS arrays to HunyuanVideo's range: 65/129/193/257). The aggregator and figure scripts handle any model tag.