Skill

lvsa-add-model

Adds LVSA support for a new video diffusion model by implementing the ModelAdapter ABC. Covers geometry, QKV extraction, RoPE, and output projection for single-stream, dual-stream, or joint-attention DiTs.

Python

PyTorch

ai-ml

npx claudepluginhub jiusiserve/longvideosparseattention --plugin lvsa-add-model

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/lvsa-add-model:lvsa-add-model

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

LVSA's engine is model-agnostic; per-model behavior lives in `lvsa/adapters/<model>.py` as a subclass of `ModelAdapter` (defined in `lvsa/adapters/base.py`). Adding a new model is **one adapter file** (~200-330 lines) plus a tiny example wrapper.

SKILL.md

229 lines · ~2k tokens

Similar Skills

lvsa-quickstart

Installs LVSA and generates long videos with block-sparse attention. Use when setting up LVSA from scratch, choosing SDPA vs FlashInfer backend, configuring reference latent frames per model, or verifying sparse path engagement.

lvsa-quickstart

lvsa-vllm-omni

Configures and runs the LVSA vllm-omni serving plugin for video diffusion models. Set per-model env vars (LVSA_WAN_HOOK, LVSA_REFERENCE_LATENT_FRAMES), debug silent Wan fallbacks, or override geometry for non-default resolutions.

lvsa-vllm-omni

wan-ascend-adaptation

Guides adaptation of Wan-series video diffusion models from NVIDIA CUDA to Huawei Ascend NPU, covering device layer, operator replacement, distributed parallelism, attention optimization, VAE parallelization, and quantization.

9 files

external-gitcode-ascend-skills

Stats

LanguagePython

Parent stars12

Parent forks3

MaintenanceGood

Last CommitJun 1, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

Adding a new model to LVSA

Overview

LVSA's engine is model-agnostic; per-model behavior lives in lvsa/adapters/<model>.py as a subclass of ModelAdapter (defined in lvsa/adapters/base.py). Adding a new model is one adapter file (~200-330 lines) plus a tiny example wrapper.

Reference adapters:

Model	File	Style	LoC	Use as template if
Wan 2.1 / 2.2	`lvsa/adapters/wan.py`	Single-stream	~220	Your model concatenates text+video before projection
HunyuanVideo 1.5	`lvsa/adapters/hunyuan_video.py`	Dual-stream	~330	Text and video have separate Q/K/V projections
CogVideoX 5B	`lvsa/adapters/cogvideox.py`	Joint-attention	~190	Shared Q/K/V projection over joint text+video

The 11 methods to implement

The ModelAdapter ABC groups methods into 4 families.

A. Geometry (3 methods)

def patches_per_frame(self, height: int, width: int, pipe: Any) -> int:
    """Spatial tokens per latent frame after VAE + patchify."""
    # For 480x832, VAE factor 8, patch size 2:
    # (480 / 8 / 2) * (832 / 8 / 2) = 1560
    ...

def latent_frames(self, num_frames: int, pipe: Any) -> int:
    """(num_frames - 1) // vae_temporal + 1. Wan/HV use 4."""
    ...

def reference_latent_frames(self, pipe: Any) -> int:
    """Training horizon in latent frames. CRITICAL — anchors the auto scheduler.
    Wan 1.3B/14B = 21, HunyuanVideo 1.5 = 33, CogVideoX 5B = 13."""
    ...

B. QKV extraction (3 methods)

def extract_qkv(self, attn, hidden, encoder) -> tuple[Tensor, Tensor, Tensor]:
    """Return Q, K, V for the joint sequence. Single-stream: concat text+video
    before projection. Dual-stream: each modality has its own linear."""
    ...

def extract_cross_attn_kv(self, attn, encoder) -> Optional[tuple[Tensor, Tensor]]:
    """Return (K, V) for an I2V-style image-conditioned cross-attention, or None."""
    ...

def split_encoder_for_cross_attn(self, attn, encoder) -> tuple[Tensor, Optional[Tensor]]:
    """For dual-modal encoders that split text vs image tokens."""
    ...

C. Position encoding + output (2 methods)

def apply_rotary(self, q, k, rotary_emb, local_seq, rank, world) -> tuple[Tensor, Tensor]:
    """Apply RoPE. Wan uses 1D RoPE; HunyuanVideo uses 3D (cos, sin) tuples."""
    ...

def output_projection(self, attn, hidden, query_dtype) -> Tensor:
    """Final linear (and any layernorm) after attention."""
    ...

D. Dual-stream + integration (3 methods)

def extract_encoder_qkv(self, attn, encoder) -> Optional[tuple[Tensor, Tensor, Tensor]]:
    """Dual-stream models project encoder Q/K/V through a separate linear.
    Return None for single-stream."""
    ...

def format_output(self, attn, hidden, encoder_output, encoder_seq_len, dtype):
    """Single-stream: return a tensor. Dual-stream: return (hidden, encoder) tuple."""
    ...

def cross_attention(self, attn, query, encoder_image, backend) -> Optional[Tensor]:
    """I2V cross-attention call, or None."""
    ...

E. Pipeline integration (3 methods)

def install_processor(self, pipe, processor) -> int:
    """Walk the transformer, replace each self-attention's processor with
    `DistributedLVSAProcessor`. Return number of blocks installed on."""
    ...

def setup_context_parallel(self, transformer, world) -> None:
    """Attach a CP plan (which dim to split, gather strategy)."""
    ...

def patch_rotary_for_cp(self, rank: int, world: int) -> None:
    """Monkey-patch the model's RoPE to be rank-aware for context-parallel runs."""
    ...

Step-by-step

1. Copy the closest existing adapter

# Single-stream model
cp lvsa/adapters/wan.py lvsa/adapters/<your_model>.py

# Dual-stream model
cp lvsa/adapters/hunyuan_video.py lvsa/adapters/<your_model>.py

2. Set the geometry

def patches_per_frame(self, height: int, width: int, pipe: Any) -> int:
    vae_sf = pipe.vae.config.scaling_factor      # spatial compression
    patch = pipe.transformer.config.patch_size   # patchify factor
    h = height // vae_sf // patch
    w = width // vae_sf // patch
    return h * w

def latent_frames(self, num_frames: int, pipe: Any) -> int:
    vae_tf = pipe.vae.config.temporal_compression_ratio
    return (num_frames - 1) // vae_tf + 1

def reference_latent_frames(self, pipe: Any) -> int:
    return 21   # CHANGE: read from model card / paper

3. Wire QKV + RoPE

Identify the model's attention class (usually <ModelName>Attention or <ModelName>SelfAttn). Adapt extract_qkv and apply_rotary to match its forward signature.

For Wan-style 1D RoPE:

def apply_rotary(self, q, k, rotary_emb, local_seq, rank, world):
    from .rope import apply_rotary_1d
    return apply_rotary_1d(q, k, rotary_emb, local_seq, rank, world)

For HunyuanVideo-style 3D (cos, sin) RoPE:

def apply_rotary(self, q, k, rotary_emb, local_seq, rank, world):
    from .rope import apply_rotary_3d_cos_sin
    return apply_rotary_3d_cos_sin(q, k, rotary_emb, local_seq, rank, world)

4. Install the processor

def install_processor(self, pipe, processor) -> int:
    transformer = pipe.transformer
    n = 0
    for block in transformer.blocks:                  # adapt path
        block.attn1.processor = processor             # adapt attribute
        n += 1
    return n

5. Register the adapter

In lvsa/adapters/__init__.py:

def get_adapter(name: str) -> ModelAdapter:
    if name == "<your_model>":
        from .<your_model> import YourModelAdapter
        return YourModelAdapter()
    ...

6. Write an example script

cp examples/wan_generate.py examples/<your_model>_generate.py

Edit:

Import your model's pipeline class
Replace WanAdapter references with your adapter
Adjust default --num-frames to the model's training horizon

7. Add a CPU smoke test

In tests/test_adapters.py, add:

class TestYourModelAdapter:
    def test_geometry(self):
        adapter = YourModelAdapter()
        assert adapter.reference_latent_frames(MockPipe()) == 21
    # ... etc

Run:

pytest tests/test_adapters.py::TestYourModelAdapter -v

8. (Optional) Wire vllm-omni hook

If your model has special pre-attention behavior in vllm-omni (sequence sharding, custom RoPE), add a hook at lvsa-vllm-omni/lvsa_vllm_omni/<your_model>_hook.py modeled on wan_hook.py or hunyuan_hook.py. Register it in register.py and document the env var (LVSA_<YOUR_MODEL>_HOOK=1) in lvsa-vllm-omni/README.md.

Validation

After your adapter compiles and tests pass on CPU, run a small GPU smoke:

python examples/<your_model>_generate.py \
    --model /path/to/your_model_checkpoint \
    --prompt "test" \
    --num-frames <training_horizon> \
    --lvsa --auto-keyframes \
    --output-name smoke.mp4

Look for [LVSA] installed on N blocks in the log — that confirms the adapter found your attention layers.

Common pitfalls

Wrong reference_latent_frames — leads to silent sparsity at training horizon. Verify against the model card.
patches_per_frame doesn't divide seq_len — geometry detection fails. Triple-check VAE spatial factor and patch size against the model config.
Custom attention layer not found in install_processor — the walk path (pipe.transformer.blocks[i].attn1) differs per model. Print the model structure first.
Multi-GPU breaks — CP requires seq_len % world_size == 0. If not divisible, add a constraint check at adapter init.

lvsa-add-model

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

lvsa-add-model

Popularity

Invocation

Context Preview

SKILL.md

Adding a new model to LVSA

Overview

The 11 methods to implement

A. Geometry (3 methods)

B. QKV extraction (3 methods)

C. Position encoding + output (2 methods)

D. Dual-stream + integration (3 methods)

E. Pipeline integration (3 methods)

Step-by-step

1. Copy the closest existing adapter

2. Set the geometry

3. Wire QKV + RoPE

4. Install the processor

5. Register the adapter

6. Write an example script

7. Add a CPU smoke test

8. (Optional) Wire vllm-omni hook

Validation

Common pitfalls

Similar Skills

Help us improve

Adding a new model to LVSA

Overview

The 11 methods to implement

A. Geometry (3 methods)

B. QKV extraction (3 methods)

C. Position encoding + output (2 methods)

D. Dual-stream + integration (3 methods)

E. Pipeline integration (3 methods)

Step-by-step

1. Copy the closest existing adapter

2. Set the geometry

3. Wire QKV + RoPE

4. Install the processor

5. Register the adapter

6. Write an example script

7. Add a CPU smoke test

8. (Optional) Wire vllm-omni hook

Validation

Common pitfalls