Help us improve
Share bugs, ideas, or general feedback.
How this skill is triggered — by the user, by Claude, or both
Slash command
/comfy:ltxv2-videoThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
LTX-V2 (LTX-2) is a 19-billion parameter DiT-based video foundation model from Lightricks. It uses a Gemma 3 12B text encoder and supports both text-to-video (T2V) and image-to-video (I2V). Key features:
Generates videos from text prompts via fal.ai models like Kling 2.6 Pro, Sora 2, LTX-2 Pro, Runway Gen-3 Turbo, Luma Dream Machine; supplies endpoints, durations, aspect ratios, prompt structures, TypeScript/Python code.
Generates ComfyUI workflow JSON files from natural language descriptions for txt2img, img2img, txt2vid, img2vid, upscale, inpaint, audio, and 3D tasks. Outputs valid, importable JSON with model download links and custom node requirements.
Builds WAN 2.2 text-to-video workflows using dual high/low-noise models, lightning LoRAs, VACE modules, and KSamplerAdvanced two-pass sampling.
Share bugs, ideas, or general feedback.
LTX-V2 (LTX-2) is a 19-billion parameter DiT-based video foundation model from Lightricks. It uses a Gemma 3 12B text encoder and supports both text-to-video (T2V) and image-to-video (I2V). Key features:
| Component | Node | Model | Notes |
|---|---|---|---|
| Checkpoint | CheckpointLoaderSimple | ltx-2-19b-distilled.safetensors | 41GB bf16, distilled variant |
| Component | Node | Model | Notes |
|---|---|---|---|
| Gemma 3 | CLIPLoader (type=ltxv) | gemma_3_12B_it_fp4_mixed.safetensors | 9GB FP4, in text_encoders/ |
Loading note: The checkpoint bundles the VAE internally. The Gemma 3 text encoder loads separately. Use CLIPLoader with type: "ltxv" pointing at the text_encoders/ directory.
| LoRA | File | Purpose |
|---|---|---|
| Distilled LoRA | ltx2/ltx-2-19b-lora-camera-control-dolly-left.safetensors | Camera dolly left |
| Distilled LoRA (384) | ltx2/ltx-2-19b-distilled-lora-384.safetensors | Apply to base model for distilled behavior |
| Camera Dolly Left | ltx-2-19b-lora-camera-control-dolly-left.safetensors | Camera movement |
Located in loras/LTXV2/:
style/PLORAV7_LTX_000010500.safetensorsconcept/head_swap_v1_13500_first_frame.safetensorsconcept/LTX-2 - Better Female Nudity.safetensorsaction/LTX2-i2v-OralSuite.safetensorsaction/LTX2-i2v-SexThrust.safetensorsconcept/ and action/ subfoldersBinds text conditioning with frame rate information:
{
"class_type": "LTXVConditioning",
"inputs": {
"positive": ["<clip_text_encode>", 0],
"negative": ["<clip_text_encode_neg>", 0],
"frame_rate": 25
}
}
Creates the initial video latent (for T2V):
{
"class_type": "EmptyLTXVLatentVideo",
"inputs": {
"width": 768,
"height": 512,
"length": 97,
"batch_size": 1
}
}
Frame count constraint: Must be 8n + 1 (9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 105, 113, 121).
Dedicated sigma schedule for LTX-V2 latent space:
{
"class_type": "LTXVScheduler",
"inputs": {
"steps": 8,
"max_shift": 2.05,
"base_shift": 0.95,
"stretch": true,
"terminal": 0.1
}
}
Connect the optional latent input for latent-aware shift scaling.
All-in-one node that encodes image, creates latent, and wraps conditioning:
{
"class_type": "LTXVImgToVideo",
"inputs": {
"positive": ["<conditioning>", 0],
"negative": ["<conditioning>", 0],
"vae": ["<checkpoint>", 2],
"image": ["<load_image>", 0],
"width": 768,
"height": 512,
"length": 97,
"batch_size": 1,
"strength": 0.6
}
}
{
"class_type": "LTXVLatentUpsampler",
"inputs": {
"latent": ["<sampler_output>", 0],
"upscale_model": ["<upscale_loader>", 0]
}
}
Requires LatentUpscaleModelLoader with ltx-2-spatial-upscaler-x2-1.0.safetensors.
Uses SamplerCustomAdvanced with manual sigmas, NOT standard KSampler:
| Parameter | Stage 1 (Generate) | Stage 2 (Upscale) |
|---|---|---|
| sampler | euler | euler |
| steps | 8 | 4 |
| cfg | 1.0 | 1.0 |
| scheduler | LTXVScheduler | Manual sigmas |
Stage 1 sigmas (via LTXVScheduler): max_shift=2.05, base_shift=0.95, stretch=true, terminal=0.1
Stage 2 sigmas (manual, for upscale refinement): 0.909375, 0.725, 0.421875, 0.0
| Parameter | Value |
|---|---|
| sampler | res_2s |
| steps | 20 |
| cfg | 4.0 |
| scheduler | LTXVScheduler |
| distilled_lora_strength | 0.6 |
| Aspect | Stage 1 | After 2x Upscale | Notes |
|---|---|---|---|
| 3:2 landscape | 768x512 | 1536x1024 | Default |
| 16:9 landscape | 960x544 | 1920x1088 | Official example |
| 1:1 square | 640x640 | 1280x1280 | |
| 4:3 landscape | 704x512 | 1408x1024 |
Start at lower resolution for Stage 1 to manage VRAM, then upscale.
8n + 1)| Frames | Duration @25fps | Duration @24fps | Notes |
|---|---|---|---|
| 49 | 1.96s | 2.04s | Quick test |
| 81 | 3.24s | 3.38s | Short clip |
| 97 | 3.88s | 4.04s | Default |
| 121 | 4.84s | 5.04s | Official example, recommended |
| 161 | 6.44s | 6.71s | Longer clip |
| 257 | 10.28s | 10.71s | Maximum |
Standard: 25 fps (conditioned via LTXVConditioning). 24 and 30 fps also supported.
CheckpointLoaderSimple → MODEL + VAE
CLIPLoader (ltxv, gemma_3_12B_it_fp4_mixed) → CLIP
├─ CLIPTextEncode (positive) → CONDITIONING
└─ CLIPTextEncode (negative) → CONDITIONING
LTXVConditioning (positive, negative, frame_rate=25) → pos/neg CONDITIONING
EmptyLTXVLatentVideo (768x512, 121 frames) → LATENT
LTXVScheduler (steps=8, max_shift=2.05, base_shift=0.95) → SIGMAS
SamplerCustomAdvanced (model, sigmas, positive, negative, latent)
→ Stage 1 LATENT
[Optional: LTXVLatentUpsampler → 2x LATENT → SamplerCustomAdvanced Stage 2]
VAEDecode (or LTXVSpatioTemporalTiledVAEDecode for VRAM savings) → IMAGE
VHS_VideoCombine (or CreateVideo + SaveVideo) → MP4
{
"1": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "ltx-2-19b-distilled.safetensors" }},
"2": { "class_type": "CLIPLoader", "inputs": { "clip_name": "gemma_3_12B_it_fp4_mixed.safetensors", "type": "ltxv" }},
"3": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["2", 0], "text": "<positive prompt>" }},
"4": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["2", 0], "text": "" }},
"5": { "class_type": "LTXVConditioning", "inputs": {
"positive": ["3", 0], "negative": ["4", 0], "frame_rate": 25
}},
"6": { "class_type": "EmptyLTXVLatentVideo", "inputs": {
"width": 768, "height": 512, "length": 121, "batch_size": 1
}},
"7": { "class_type": "LTXVScheduler", "inputs": {
"steps": 8, "max_shift": 2.05, "base_shift": 0.95,
"stretch": true, "terminal": 0.1, "latent": ["6", 0]
}},
"8": { "class_type": "KSamplerSelect", "inputs": { "sampler_name": "euler" }},
"9": { "class_type": "SamplerCustomAdvanced", "inputs": {
"model": ["1", 0],
"positive": ["5", 0],
"negative": ["5", 1],
"sigmas": ["7", 0],
"latent_image": ["6", 0],
"noise": ["10", 0],
"sampler": ["8", 0],
"guider": ["11", 0]
}},
"10": { "class_type": "RandomNoise", "inputs": { "noise_seed": 42 }},
"11": { "class_type": "CFGGuider", "inputs": {
"model": ["1", 0],
"positive": ["5", 0],
"negative": ["5", 1],
"cfg": 1.0
}},
"12": { "class_type": "VAEDecode", "inputs": { "samples": ["9", 0], "vae": ["1", 2] }},
"13": { "class_type": "VHS_VideoCombine", "inputs": {
"images": ["12", 0], "frame_rate": 25, "loop_count": 0,
"filename_prefix": "ltxv2", "format": "video/h264-mp4",
"pingpong": false, "save_output": true,
"pix_fmt": "yuv420p", "crf": 19, "save_metadata": true, "trim_to_audio": false
}}
}
Alternative simple output (built-in nodes instead of VHS):
{
"12": { "class_type": "VAEDecode", "inputs": { "samples": ["9", 0], "vae": ["1", 2] }},
"13": { "class_type": "CreateVideo", "inputs": { "images": ["12", 0], "fps": 25 }},
"14": { "class_type": "SaveVideo", "inputs": { "video": ["13", 0], "filename_prefix": "video/ltxv2", "format": "auto", "codec": "auto" }}
}
Seven official camera control LoRAs from Lightricks:
| Movement | LoRA File |
|---|---|
| Dolly Left | ltx-2-19b-lora-camera-control-dolly-left.safetensors |
| Dolly Right | ltx-2-19b-lora-camera-control-dolly-right.safetensors |
| Dolly In | ltx-2-19b-lora-camera-control-dolly-in.safetensors |
| Dolly Out | ltx-2-19b-lora-camera-control-dolly-out.safetensors |
| Jib Up | ltx-2-19b-lora-camera-control-jib-up.safetensors |
| Jib Down | ltx-2-19b-lora-camera-control-jib-down.safetensors |
| Static | ltx-2-19b-lora-camera-control-static.safetensors |
Usage: Apply with LoraLoaderModelOnly at strength 1.0. Do NOT describe camera movement in your prompt — the LoRA handles it.
{
"class_type": "LoraLoaderModelOnly",
"inputs": {
"model": ["<checkpoint>", 0],
"lora_name": "ltx-2-19b-lora-camera-control-dolly-left.safetensors",
"strength_model": 1.0
}
}
Cannot combine camera control LoRA with IC-LoRA (canny/depth/pose) in the same generation.
Apply with LoraLoaderModelOnly. Typical strength: 0.5–1.0.
{
"class_type": "LoraLoaderModelOnly",
"inputs": {
"model": ["<checkpoint_or_camera_lora>", 0],
"lora_name": "LTXV2\\concept\\LTX-2 - Better Female Nudity.safetensors",
"strength_model": 0.8
}
}
Concept/style LoRAs CAN be stacked with camera control LoRAs.
| Config | VRAM | Notes |
|---|---|---|
| bf16 checkpoint + FP4 Gemma | ~24GB+ | Tight on RTX 4090, may OOM |
| FP8 checkpoint + FP4 Gemma | ~16-20GB | Recommended for 24GB GPUs |
| bf16 + tiled VAE decode | ~22GB | Use LTXVSpatioTemporalTiledVAEDecode |
VRAM warnings from MEMORY.md: "LTXV2 can OOM on 24GB — suggest FP8 quantized models or --lowvram"
VAEDecodeTiled or LTXVSpatioTemporalTiledVAEDecode instead of standard VAEDecodeclear_vram before switching to LTX-V2 from another model familyNatural language descriptions. Be specific about motion, camera angles, and temporal progression:
Good: "A woman with flowing auburn hair walks through a sun-dappled forest, leaves falling gently around her, soft golden hour lighting, cinematic depth of field"
Bad: "woman, forest, walking"
Describe the entire scene progression, not just a single moment. Include lighting, mood, and motion cues.
For production quality, generate at low resolution then upscale:
LTXVLatentUpsampler (2x spatial) → 1536x1024This requires the spatial upscaler model: ltx-2-spatial-upscaler-x2-1.0.safetensors (place in models/latent_upscale_models/).