Skill

ltxv2-video

Builds LTX-V2 19B video workflows for text-to-video, image-to-video using distilled models, camera control LoRAs, and two-stage upscaling in ComfyUI.

ai-ml

npx claudepluginhub artokun/comfyui-mcp --plugin comfy

Popularity

Parent stars

Parent forks

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/comfy:ltxv2-video

User invocable

Model invocable

Inline context

Default effort

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

LTX-V2 (LTX-2) is a 19-billion parameter DiT-based video foundation model from Lightricks. It uses a Gemma 3 12B text encoder and supports both text-to-video (T2V) and image-to-video (I2V). Key features:

SKILL.md

359 lines · ~2.9k tokens

Similar Skills

fal-text-to-video

Generates videos from text prompts via fal.ai models like Kling 2.6 Pro, Sora 2, LTX-2 Pro, Runway Gen-3 Turbo, Luma Dream Machine; supplies endpoints, durations, aspect ratios, prompt structures, TypeScript/Python code.

fal-ai-master

ComfyUI Workflow Generator

Generates ComfyUI workflow JSON files from natural language descriptions for txt2img, img2img, txt2vid, img2vid, upscale, inpaint, audio, and 3D tasks. Outputs valid, importable JSON with model download links and custom node requirements.

20 files

workflow-skill

wan-t2v-video

Builds WAN 2.2 text-to-video workflows using dual high/low-noise models, lightning LoRAs, VACE modules, and KSamplerAdvanced two-pass sampling.

comfy

Stats

LanguageTypeScript

Parent stars92

Parent forks20

MaintenanceGood

Last CommitFeb 17, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

LTX-V2 19B Video Workflows

Overview

Distilled model for fast 8-step generation
Two-stage pipeline: Generate at low res, then 2x spatial upscale in latent space
Camera control LoRAs for cinematic movements
Audio-video generation in a single pass (optional)

Models

Checkpoint (Installed)

Component	Node	Model	Notes
Checkpoint	`CheckpointLoaderSimple`	`ltx-2-19b-distilled.safetensors`	41GB bf16, distilled variant

Text Encoder (Installed)

Component	Node	Model	Notes
Gemma 3	`CLIPLoader` (type=`ltxv`)	`gemma_3_12B_it_fp4_mixed.safetensors`	9GB FP4, in text_encoders/

Loading note: The checkpoint bundles the VAE internally. The Gemma 3 text encoder loads separately. Use CLIPLoader with type: "ltxv" pointing at the text_encoders/ directory.

LoRAs (Installed)

LoRA	File	Purpose
Distilled LoRA	`ltx2/ltx-2-19b-lora-camera-control-dolly-left.safetensors`	Camera dolly left
Distilled LoRA (384)	`ltx2/ltx-2-19b-distilled-lora-384.safetensors`	Apply to base model for distilled behavior
Camera Dolly Left	`ltx-2-19b-lora-camera-control-dolly-left.safetensors`	Camera movement

Concept/Style LoRAs (Installed)

Located in loras/LTXV2/:

style/PLORAV7_LTX_000010500.safetensors
concept/head_swap_v1_13500_first_frame.safetensors
concept/LTX-2 - Better Female Nudity.safetensors
action/LTX2-i2v-OralSuite.safetensors
action/LTX2-i2v-SexThrust.safetensors
And more in concept/ and action/ subfolders

Key Nodes

LTXVConditioning

Binds text conditioning with frame rate information:

{
  "class_type": "LTXVConditioning",
  "inputs": {
    "positive": ["<clip_text_encode>", 0],
    "negative": ["<clip_text_encode_neg>", 0],
    "frame_rate": 25
  }
}

EmptyLTXVLatentVideo

Creates the initial video latent (for T2V):

{
  "class_type": "EmptyLTXVLatentVideo",
  "inputs": {
    "width": 768,
    "height": 512,
    "length": 97,
    "batch_size": 1
  }
}

Frame count constraint: Must be 8n + 1 (9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121).

LTXVScheduler

Dedicated sigma schedule for LTX-V2 latent space:

{
  "class_type": "LTXVScheduler",
  "inputs": {
    "steps": 8,
    "max_shift": 2.05,
    "base_shift": 0.95,
    "stretch": true,
    "terminal": 0.1
  }
}

Connect the optional latent input for latent-aware shift scaling.

LTXVImgToVideo (For I2V)

All-in-one node that encodes image, creates latent, and wraps conditioning:

{
  "class_type": "LTXVImgToVideo",
  "inputs": {
    "positive": ["<conditioning>", 0],
    "negative": ["<conditioning>", 0],
    "vae": ["<checkpoint>", 2],
    "image": ["<load_image>", 0],
    "width": 768,
    "height": 512,
    "length": 97,
    "batch_size": 1,
    "strength": 0.6
  }
}

LTXVLatentUpsampler (For Two-Stage Upscale)

{
  "class_type": "LTXVLatentUpsampler",
  "inputs": {
    "latent": ["<sampler_output>", 0],
    "upscale_model": ["<upscale_loader>", 0]
  }
}

Requires LatentUpscaleModelLoader with ltx-2-spatial-upscaler-x2-1.0.safetensors.

Sampler Settings

Distilled Model (Installed)

Uses SamplerCustomAdvanced with manual sigmas, NOT standard KSampler:

Parameter	Stage 1 (Generate)	Stage 2 (Upscale)
sampler	euler	euler
steps	8	4
cfg	1.0	1.0
scheduler	LTXVScheduler	Manual sigmas

Stage 1 sigmas (via LTXVScheduler): max_shift=2.05, base_shift=0.95, stretch=true, terminal=0.1

Stage 2 sigmas (manual, for upscale refinement): 0.909375, 0.725, 0.421875, 0.0

Base Model (If Using Distilled LoRA on Base)

Parameter	Value
sampler	res_2s
steps	20
cfg	4.0
scheduler	LTXVScheduler
distilled_lora_strength	0.6

Resolution and Frame Count

Resolutions (Must be multiples of 32)

Aspect	Stage 1	After 2x Upscale	Notes
3:2 landscape	768x512	1536x1024	Default
16:9 landscape	960x544	1920x1088	Official example
1:1 square	640x640	1280x1280
4:3 landscape	704x512	1408x1024

Start at lower resolution for Stage 1 to manage VRAM, then upscale.

Frame Count (`8n + 1`)

Frames	Duration @25fps	Duration @24fps	Notes
49	1.96s	2.04s	Quick test
81	3.24s	3.38s	Short clip
97	3.88s	4.04s	Default
121	4.84s	5.04s	Official example, recommended
161	6.44s	6.71s	Longer clip
257	10.28s	10.71s	Maximum

Frame Rate

Standard: 25 fps (conditioned via LTXVConditioning). 24 and 30 fps also supported.

Pipeline Flow: T2V Distilled

CheckpointLoaderSimple → MODEL + VAE
CLIPLoader (ltxv, gemma_3_12B_it_fp4_mixed) → CLIP
  ├─ CLIPTextEncode (positive) → CONDITIONING
  └─ CLIPTextEncode (negative) → CONDITIONING

LTXVConditioning (positive, negative, frame_rate=25) → pos/neg CONDITIONING
EmptyLTXVLatentVideo (768x512, 121 frames) → LATENT
LTXVScheduler (steps=8, max_shift=2.05, base_shift=0.95) → SIGMAS

SamplerCustomAdvanced (model, sigmas, positive, negative, latent)
  → Stage 1 LATENT

[Optional: LTXVLatentUpsampler → 2x LATENT → SamplerCustomAdvanced Stage 2]

VAEDecode (or LTXVSpatioTemporalTiledVAEDecode for VRAM savings) → IMAGE
VHS_VideoCombine (or CreateVideo + SaveVideo) → MP4

Complete Workflow: T2V Distilled (8-Step)

{
  "1": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "ltx-2-19b-distilled.safetensors" }},
  "2": { "class_type": "CLIPLoader", "inputs": { "clip_name": "gemma_3_12B_it_fp4_mixed.safetensors", "type": "ltxv" }},
  "3": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["2", 0], "text": "<positive prompt>" }},
  "4": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["2", 0], "text": "" }},
  "5": { "class_type": "LTXVConditioning", "inputs": {
    "positive": ["3", 0], "negative": ["4", 0], "frame_rate": 25
  }},
  "6": { "class_type": "EmptyLTXVLatentVideo", "inputs": {
    "width": 768, "height": 512, "length": 121, "batch_size": 1
  }},
  "7": { "class_type": "LTXVScheduler", "inputs": {
    "steps": 8, "max_shift": 2.05, "base_shift": 0.95,
    "stretch": true, "terminal": 0.1, "latent": ["6", 0]
  }},
  "8": { "class_type": "KSamplerSelect", "inputs": { "sampler_name": "euler" }},
  "9": { "class_type": "SamplerCustomAdvanced", "inputs": {
    "model": ["1", 0],
    "positive": ["5", 0],
    "negative": ["5", 1],
    "sigmas": ["7", 0],
    "latent_image": ["6", 0],
    "noise": ["10", 0],
    "sampler": ["8", 0],
    "guider": ["11", 0]
  }},
  "10": { "class_type": "RandomNoise", "inputs": { "noise_seed": 42 }},
  "11": { "class_type": "CFGGuider", "inputs": {
    "model": ["1", 0],
    "positive": ["5", 0],
    "negative": ["5", 1],
    "cfg": 1.0
  }},
  "12": { "class_type": "VAEDecode", "inputs": { "samples": ["9", 0], "vae": ["1", 2] }},
  "13": { "class_type": "VHS_VideoCombine", "inputs": {
    "images": ["12", 0], "frame_rate": 25, "loop_count": 0,
    "filename_prefix": "ltxv2", "format": "video/h264-mp4",
    "pingpong": false, "save_output": true,
    "pix_fmt": "yuv420p", "crf": 19, "save_metadata": true, "trim_to_audio": false
  }}
}

Alternative simple output (built-in nodes instead of VHS):

{
  "12": { "class_type": "VAEDecode", "inputs": { "samples": ["9", 0], "vae": ["1", 2] }},
  "13": { "class_type": "CreateVideo", "inputs": { "images": ["12", 0], "fps": 25 }},
  "14": { "class_type": "SaveVideo", "inputs": { "video": ["13", 0], "filename_prefix": "video/ltxv2", "format": "auto", "codec": "auto" }}
}

Camera Control LoRAs

Seven official camera control LoRAs from Lightricks:

Movement	LoRA File
Dolly Left	`ltx-2-19b-lora-camera-control-dolly-left.safetensors`
Dolly Right	`ltx-2-19b-lora-camera-control-dolly-right.safetensors`
Dolly In	`ltx-2-19b-lora-camera-control-dolly-in.safetensors`
Dolly Out	`ltx-2-19b-lora-camera-control-dolly-out.safetensors`
Jib Up	`ltx-2-19b-lora-camera-control-jib-up.safetensors`
Jib Down	`ltx-2-19b-lora-camera-control-jib-down.safetensors`
Static	`ltx-2-19b-lora-camera-control-static.safetensors`

Usage: Apply with LoraLoaderModelOnly at strength 1.0. Do NOT describe camera movement in your prompt — the LoRA handles it.

{
  "class_type": "LoraLoaderModelOnly",
  "inputs": {
    "model": ["<checkpoint>", 0],
    "lora_name": "ltx-2-19b-lora-camera-control-dolly-left.safetensors",
    "strength_model": 1.0
  }
}

Cannot combine camera control LoRA with IC-LoRA (canny/depth/pose) in the same generation.

Concept/Style LoRAs

Apply with LoraLoaderModelOnly. Typical strength: 0.5–1.0.

{
  "class_type": "LoraLoaderModelOnly",
  "inputs": {
    "model": ["<checkpoint_or_camera_lora>", 0],
    "lora_name": "LTXV2\\concept\\LTX-2 - Better Female Nudity.safetensors",
    "strength_model": 0.8
  }
}

Concept/style LoRAs CAN be stacked with camera control LoRAs.

VRAM Considerations

Config	VRAM	Notes
bf16 checkpoint + FP4 Gemma	~24GB+	Tight on RTX 4090, may OOM
FP8 checkpoint + FP4 Gemma	~16-20GB	Recommended for 24GB GPUs
bf16 + tiled VAE decode	~22GB	Use `LTXVSpatioTemporalTiledVAEDecode`

VRAM warnings from MEMORY.md: "LTXV2 can OOM on 24GB — suggest FP8 quantized models or --lowvram"

Tips for 24GB GPUs

Use VAEDecodeTiled or LTXVSpatioTemporalTiledVAEDecode instead of standard VAEDecode
Start at 768x512 resolution, upscale in Stage 2
Use FP4 Gemma text encoder (installed)
Consider GGUF quantized models for tighter VRAM budgets
Always clear_vram before switching to LTX-V2 from another model family
Reduce frame count to 81 or 49 if OOM persists

Prompt Style

Natural language descriptions. Be specific about motion, camera angles, and temporal progression:

Good: "A woman with flowing auburn hair walks through a sun-dappled forest, leaves falling gently around her, soft golden hour lighting, cinematic depth of field"
Bad: "woman, forest, walking"

Describe the entire scene progression, not just a single moment. Include lighting, mood, and motion cues.

Two-Stage Upscale Pattern

For production quality, generate at low resolution then upscale:

Stage 1: Generate at 768x512, 121 frames, 8 steps (distilled)
Upscale: LTXVLatentUpsampler (2x spatial) → 1536x1024
Stage 2: Resample the upscaled latent with 3-4 steps at CFG 1.0
Decode: Use tiled VAE decode for the larger resolution

This requires the spatial upscaler model: ltx-2-spatial-upscaler-x2-1.0.safetensors (place in models/latent_upscale_models/).

ltxv2-video

Popularity

Invocation

Context Preview

SKILL.md

Similar Skills

Help us improve

Help us improve

Find plugins for your project

ltxv2-video

Popularity

Invocation

Context Preview

SKILL.md

LTX-V2 19B Video Workflows

Overview

Models

Checkpoint (Installed)

Text Encoder (Installed)

LoRAs (Installed)

Concept/Style LoRAs (Installed)

Key Nodes

LTXVConditioning

EmptyLTXVLatentVideo

LTXVScheduler

LTXVImgToVideo (For I2V)

LTXVLatentUpsampler (For Two-Stage Upscale)

Sampler Settings

Distilled Model (Installed)

Base Model (If Using Distilled LoRA on Base)

Resolution and Frame Count

Resolutions (Must be multiples of 32)

Frame Count (8n + 1)

Frame Rate

Pipeline Flow: T2V Distilled

Complete Workflow: T2V Distilled (8-Step)

Camera Control LoRAs

Concept/Style LoRAs

VRAM Considerations

Tips for 24GB GPUs

Prompt Style

Two-Stage Upscale Pattern

Similar Skills

Help us improve

LTX-V2 19B Video Workflows

Overview

Models

Checkpoint (Installed)

Text Encoder (Installed)

LoRAs (Installed)

Concept/Style LoRAs (Installed)

Key Nodes

LTXVConditioning

EmptyLTXVLatentVideo

LTXVScheduler

LTXVImgToVideo (For I2V)

LTXVLatentUpsampler (For Two-Stage Upscale)

Sampler Settings

Distilled Model (Installed)

Base Model (If Using Distilled LoRA on Base)

Resolution and Frame Count

Resolutions (Must be multiples of 32)

Frame Count (8n + 1)

Frame Rate

Pipeline Flow: T2V Distilled

Complete Workflow: T2V Distilled (8-Step)

Camera Control LoRAs

Concept/Style LoRAs

VRAM Considerations

Tips for 24GB GPUs

Prompt Style

Two-Stage Upscale Pattern

Frame Count (`8n + 1`)

Frame Count (`8n + 1`)