Guides ComfyUI workflows for WAN 2.2 I2V first-last-frame video generation using required dual hi-lo KSamplerAdvanced passes or WanVideoWrapper VACE.
From comfynpx claudepluginhub artokun/comfyui-mcp --plugin comfyThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
First-Last-Frame (FLF) video generation takes a start image and an end image and generates a smooth video transition between them. WAN 2.2 I2V (Image-to-Video) 14B model excels at this.
WAN 2.2 I2V uses a split-noise architecture. Unlike WAN 2.1, the 2.2 model was trained with separate HighNoise and LowNoise components that handle different denoising ranges. You MUST use both models in a two-pass KSamplerAdvanced setup. Using a single model produces low-quality, broken output.
WanFirstLastFrameToVideoNEVER use a single KSampler with only one model for WAN 2.2 I2V.
Two native approaches are available:
WanFirstLastFrameToVideo + dual KSamplerAdvanced two-passWanVideoVACEStartToEndFrame + WanVideoVACEEncode + WanVideoSampler (VACE, caching, context windows)Remix NSFW (Recommended — built-in lightning, fp16):
| Model | Loader | Notes |
|---|---|---|
Wan2.2_Remix_NSFW_i2v_14b_high_lighting_fp16_v2.1.safetensors | UNETLoader | HighNoise, built-in lightning acceleration |
Wan2.2_Remix_NSFW_i2v_14b_low_lighting_fp16_v2.1.safetensors | UNETLoader | LowNoise, built-in lightning acceleration |
GGUF Q8 (Alternative — needs external lightning LoRAs):
| Model | Loader | Notes |
|---|---|---|
Wan2.2-I2V-A14B-HighNoise-Q8_0.gguf | UnetLoaderGGUF | HighNoise, quantized |
Wan2.2-I2V-A14B-LowNoise-Q8_0.gguf | UnetLoaderGGUF | LowNoise, quantized |
Official fp8:
| Model | Loader | Notes |
|---|---|---|
wan2.2_i2v_high_noise_14B_fp8_scaled.safetensors | UNETLoader | HighNoise, needs lightning LoRA |
wan2.2_i2v_low_noise_14B_fp8_scaled.safetensors | UNETLoader | LowNoise, needs lightning LoRA |
| Model | Node | Notes |
|---|---|---|
nsfw_wan_umt5-xxl_bf16_fixed.safetensors | CLIPLoaderGGUF (type=wan) | NSFW-tuned, pair with Remix models |
umt5_xxl_fp8_e4m3fn_scaled.safetensors | CLIPLoader (type=wan) | Standard UMT5-XXL fp8 |
| Component | Node | Model |
|---|---|---|
| CLIP Vision | CLIPVisionLoader | clip_vision_h.safetensors |
| VAE | VAELoader | wan_2.1_vae.safetensors |
WAN 2.2 uses flow matching and requires ModelSamplingSD3 applied to each UNET:
{"class_type": "ModelSamplingSD3", "inputs": {"model": ["<unet>", 0], "shift": 5}}
shift=5 for lightning/Remix models. shift=8 for standard (non-lightning) models.
Remix NSFW models have lightning baked in — no external LoRA needed.
For GGUF/fp8 models, use paired hi/lo lightning LoRAs:
wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensors → HighNoise UNETwan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensors → LowNoise UNETEach model path has two stacked loaders (Common + Specific), each supporting 4 LoRA slots:
Hi path: UNETLoader(HN) → ModelSamplingSD3(shift=5) → Hi Common Stack → Hi Lora Stack → MODEL_HI
Lo path: UNETLoader(LN) → ModelSamplingSD3(shift=5) → Lo Common Stack → Lo Lora Stack → MODEL_LO
Common stacks hold shared LoRAs (quality/style). Specific stacks hold model-variant LoRAs. Set slots to "None" when unused. Even with no LoRAs, include the stacks — they pass CLIP through for text encoding.
Input frames MUST be resized to the target video resolution before FLF and CLIPVisionEncode. The end frame inherits width/height from the start frame's resize to ensure matching dimensions.
{"class_type": "ImageResizeKJv2", "inputs": {
"image": ["<load_image>", 0], "width": 480, "height": 720,
"upscale_method": "nearest-exact", "keep_proportion": "crop",
"pad_color": "0, 0, 0", "crop_position": "center", "divisible_by": 2
}}
| Parameter | Pass 1 (Hi) | Pass 2 (Lo) |
|---|---|---|
| model | Hi LoRA stack output | Lo LoRA stack output |
| add_noise | enable | disable |
| steps | 4 | 4 |
| cfg | 1 | 1 |
| sampler_name | uni_pc | uni_pc |
| scheduler | beta | beta |
| start_at_step | 0 | 2 |
| end_at_step | 2 | 4 |
| return_with_leftover_noise | enable | disable |
| latent_image | WanFLF output[2] | Pass 1 output[0] |
Both passes share the same positive/negative conditioning from WanFirstLastFrameToVideo outputs [0] and [1].
For standard (non-lightning) models: steps=20, split at step 10, cfg=4, sampler=euler, scheduler=simple, shift=8.
Always include a quality negative prompt:
The tones are vibrant, overexposed, static, details are unclear, subtitles, style, work, painting, image, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, distorted limbs, merged fingers, motionless image, cluttered background, three legs, many people in the background, walking backwards
Required Inputs:
- positive: CONDITIONING (from CLIPTextEncode)
- negative: CONDITIONING (from CLIPTextEncode with negative prompt)
- vae: VAE
- width: INT (from ImageResizeKJv2 end frame output[1])
- height: INT (from ImageResizeKJv2 end frame output[2])
- length: INT (default 81, step 4) — number of frames
- batch_size: INT (default 1)
Optional Inputs:
- clip_vision_start_image: CLIP_VISION_OUTPUT (from CLIPVisionEncode)
- clip_vision_end_image: CLIP_VISION_OUTPUT (from CLIPVisionEncode)
- start_image: IMAGE (resized start frame)
- end_image: IMAGE (resized end frame)
Outputs:
- [0] positive: CONDITIONING → feed to BOTH Hi and Lo KSamplerAdvanced
- [1] negative: CONDITIONING → feed to BOTH Hi and Lo KSamplerAdvanced
- [2] latent: LATENT → feed to Hi Pass only (Lo Pass gets Hi Pass output)
UNETLoader (HighNoise) → ModelSamplingSD3 (shift=5) → Hi Common Stack → Hi Lora Stack → MODEL_HI
UNETLoader (LowNoise) → ModelSamplingSD3 (shift=5) → Lo Common Stack → Lo Lora Stack → MODEL_LO
CLIPLoaderGGUF (wan) → CLIP
├─ CLIPTextEncode (positive) → CONDITIONING
└─ CLIPTextEncode (negative) → CONDITIONING
CLIPVisionLoader → CLIPVisionEncode (start) + CLIPVisionEncode (end)
VAELoader → VAE
LoadImage (start) → ImageResizeKJv2 (480x720) → resized start
LoadImage (end) → ImageResizeKJv2 (match dims) → resized end
WanFirstLastFrameToVideo (positive, negative, vae, clip_vision_start, clip_vision_end,
start_image, end_image, width/height from resize)
→ modified positive [0], modified negative [1], latent [2]
KSamplerAdvanced (Hi: MODEL_HI, steps 0→2, add_noise=enable, return_leftover=enable)
→ noisy LATENT
KSamplerAdvanced (Lo: MODEL_LO, steps 2→4, add_noise=disable, return_leftover=disable)
→ final LATENT
VAEDecode → IMAGE → VHS_VideoCombine (raw output)
→ VRAM_Debug → SeedVR2VideoUpscaler (1080p) → VHS_VideoCombine (upscaled)
{
"1": { "class_type": "UNETLoader", "inputs": { "unet_name": "Wan2.2_Remix_NSFW_i2v_14b_high_lighting_fp16_v2.1.safetensors", "weight_dtype": "default" }, "_meta": { "title": "UNET HighNoise" }},
"2": { "class_type": "UNETLoader", "inputs": { "unet_name": "Wan2.2_Remix_NSFW_i2v_14b_low_lighting_fp16_v2.1.safetensors", "weight_dtype": "default" }, "_meta": { "title": "UNET LowNoise" }},
"3": { "class_type": "CLIPLoaderGGUF", "inputs": { "clip_name": "nsfw_wan_umt5-xxl_bf16_fixed.safetensors", "type": "wan" }},
"4": { "class_type": "CLIPVisionLoader", "inputs": { "clip_name": "clip_vision_h.safetensors" }},
"5": { "class_type": "VAELoader", "inputs": { "vae_name": "wan_2.1_vae.safetensors" }},
"6": { "class_type": "LoadImage", "inputs": { "image": "<start_image.png>" }, "_meta": { "title": "Start Frame" }},
"7": { "class_type": "LoadImage", "inputs": { "image": "<end_image.png>" }, "_meta": { "title": "End Frame" }},
"8": { "class_type": "ModelSamplingSD3", "inputs": { "model": ["1", 0], "shift": 5 }, "_meta": { "title": "Hi Shift" }},
"9": { "class_type": "ModelSamplingSD3", "inputs": { "model": ["2", 0], "shift": 5 }, "_meta": { "title": "Lo Shift" }},
"10": { "class_type": "Lora Loader Stack (rgthree)", "inputs": {
"model": ["8", 0], "clip": ["3", 0],
"lora_01": "None", "strength_01": 1, "lora_02": "None", "strength_02": 1,
"lora_03": "None", "strength_03": 1, "lora_04": "None", "strength_04": 1
}, "_meta": { "title": "Hi Common" }},
"11": { "class_type": "Lora Loader Stack (rgthree)", "inputs": {
"model": ["10", 0], "clip": ["10", 1],
"lora_01": "None", "strength_01": 1, "lora_02": "None", "strength_02": 1,
"lora_03": "None", "strength_03": 1, "lora_04": "None", "strength_04": 1
}, "_meta": { "title": "Hi Lora" }},
"12": { "class_type": "Lora Loader Stack (rgthree)", "inputs": {
"model": ["9", 0], "clip": ["3", 0],
"lora_01": "None", "strength_01": 1, "lora_02": "None", "strength_02": 1,
"lora_03": "None", "strength_03": 1, "lora_04": "None", "strength_04": 1
}, "_meta": { "title": "Low Common" }},
"13": { "class_type": "Lora Loader Stack (rgthree)", "inputs": {
"model": ["12", 0], "clip": ["12", 1],
"lora_01": "None", "strength_01": 1, "lora_02": "None", "strength_02": 1,
"lora_03": "None", "strength_03": 1, "lora_04": "None", "strength_04": 1
}, "_meta": { "title": "Low Lora" }},
"14": { "class_type": "ImageResizeKJv2", "inputs": {
"image": ["6", 0], "width": 480, "height": 720,
"upscale_method": "nearest-exact", "keep_proportion": "crop",
"pad_color": "0, 0, 0", "crop_position": "center", "divisible_by": 2
}, "_meta": { "title": "Resize Start" }},
"15": { "class_type": "ImageResizeKJv2", "inputs": {
"image": ["7", 0], "width": ["14", 1], "height": ["14", 2],
"upscale_method": "nearest-exact", "keep_proportion": "crop",
"pad_color": "0, 0, 0", "crop_position": "center", "divisible_by": 2
}, "_meta": { "title": "Resize End" }},
"16": { "class_type": "CLIPVisionEncode", "inputs": { "clip_vision": ["4", 0], "image": ["14", 0], "crop": "center" }},
"17": { "class_type": "CLIPVisionEncode", "inputs": { "clip_vision": ["4", 0], "image": ["15", 0], "crop": "center" }},
"18": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["11", 1], "text": "<positive prompt>" }, "_meta": { "title": "Positive" }},
"19": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["11", 1], "text": "The tones are vibrant, overexposed, static, details are unclear, subtitles, style, work, painting, image, still, overall grayish, worst quality, low quality, JPEG compression artifacts, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, distorted limbs, merged fingers, motionless image, cluttered background, three legs, many people in the background, walking backwards" }, "_meta": { "title": "Negative" }},
"20": { "class_type": "WanFirstLastFrameToVideo", "inputs": {
"positive": ["18", 0], "negative": ["19", 0], "vae": ["5", 0],
"clip_vision_start_image": ["16", 0], "clip_vision_end_image": ["17", 0],
"start_image": ["14", 0], "end_image": ["15", 0],
"width": ["15", 1], "height": ["15", 2], "length": 81, "batch_size": 1
}},
"21": { "class_type": "KSamplerAdvanced", "inputs": {
"model": ["11", 0], "positive": ["20", 0], "negative": ["20", 1], "latent_image": ["20", 2],
"add_noise": "enable", "noise_seed": 0, "steps": 4, "cfg": 1,
"sampler_name": "uni_pc", "scheduler": "beta",
"start_at_step": 0, "end_at_step": 2, "return_with_leftover_noise": "enable"
}, "_meta": { "title": "Hi Pass" }},
"22": { "class_type": "KSamplerAdvanced", "inputs": {
"model": ["13", 0], "positive": ["20", 0], "negative": ["20", 1], "latent_image": ["21", 0],
"add_noise": "disable", "noise_seed": 0, "steps": 4, "cfg": 1,
"sampler_name": "uni_pc", "scheduler": "beta",
"start_at_step": 2, "end_at_step": 4, "return_with_leftover_noise": "disable"
}, "_meta": { "title": "Lo Pass" }},
"23": { "class_type": "VAEDecode", "inputs": { "samples": ["22", 0], "vae": ["5", 0] }},
"24": { "class_type": "VHS_VideoCombine", "inputs": {
"images": ["23", 0], "frame_rate": 16, "loop_count": 0,
"filename_prefix": "wan_flf", "format": "video/h264-mp4",
"pingpong": false, "save_output": true,
"pix_fmt": "yuv420p", "crf": 19, "save_metadata": true, "trim_to_audio": false
}}
}
Add after VAEDecode for AI-powered video upscaling to 1080p. Use VRAM_Debug to free VRAM between generation and upscaling:
{
"25": { "class_type": "VRAM_Debug", "inputs": {
"image_pass": ["23", 0], "empty_cache": true, "gc_collect": true, "unload_all_models": true
}},
"26": { "class_type": "SeedVR2LoadDiTModel", "inputs": {
"model": "seedvr2_ema_3b_fp8_e4m3fn.safetensors", "device": "cuda:0",
"blocks_to_swap": 0, "swap_io_components": false, "cache_model": false, "attention_mode": "sdpa"
}},
"27": { "class_type": "SeedVR2LoadVAEModel", "inputs": {
"model": "ema_vae_fp16.safetensors", "device": "cuda:0",
"encode_tiled": false, "decode_tiled": false, "cache_model": false
}},
"28": { "class_type": "SeedVR2VideoUpscaler", "inputs": {
"image": ["25", 1], "dit": ["26", 0], "vae": ["27", 0],
"seed": 0, "resolution": 1080, "max_resolution": 0,
"batch_size": 5, "uniform_batch_size": false, "color_correction": "lab"
}},
"29": { "class_type": "VHS_VideoCombine", "inputs": {
"images": ["28", 0], "frame_rate": 16, "loop_count": 0,
"filename_prefix": "wan_flf_upscaled", "format": "video/h264-mp4",
"pingpong": false, "save_output": true,
"pix_fmt": "yuv420p", "crf": 19, "save_metadata": true, "trim_to_audio": false
}}
}
When using GGUF Q8 models instead of Remix, add paired lightning LoRAs:
Hi path: UnetLoaderGGUF(HN Q8) → ModelSamplingSD3(shift=5) → LoraLoaderModelOnly(hi_noise_lightning) → Hi Common Stack → Hi Lora Stack
Lo path: UnetLoaderGGUF(LN Q8) → ModelSamplingSD3(shift=5) → LoraLoaderModelOnly(lo_noise_lightning) → Lo Common Stack → Lo Lora Stack
LoRA files:
Unknown\no tags\wan2.2_i2v_lightx2v_4steps_lora_v1_high_noise.safetensorsUnknown\no tags\wan2.2_i2v_lightx2v_4steps_lora_v1_low_noise.safetensorsUses the WanVideoWrapper custom node pack for more control over conditioning, caching, context windows, and advanced features.
WANVIDEOMODEL type instead of generic MODELWANVIDIMAGE_EMBEDS for conditioning instead of CONDITIONINGWanVideoSampler) with shift parameter and scheduler optionsWanVideoModelLoader → WANVIDEOMODEL
WanVideoVAELoader → WANVAE
WanVideoTextEncode → WANVIDEOTEXTEMBEDS
WanVideoClipVisionEncode (start + end images) → WANVIDIMAGE_CLIPEMBEDS
WanVideoVACEStartToEndFrame (start_image, end_image, num_frames=81)
→ images batch, masks
WanVideoVACEEncode (vae, input_frames, input_masks, width, height, num_frames)
→ WANVIDIMAGE_EMBEDS (vace_embeds)
WanVideoSampler (model, image_embeds, text_embeds, steps, cfg, shift, scheduler)
→ LATENT
WanVideoDecode (vae, samples) → IMAGE → VHS_VideoCombine → MP4
| Parameter | Standard | Lightning | Notes |
|---|---|---|---|
| steps | 30 | 4 | |
| cfg | 6.0 | 1.0 | |
| shift | 5.0 | 5.0 | Flow matching shift |
| scheduler | unipc | euler | WanVideoWrapper has own schedulers |
| force_offload | true | true | Move model to CPU after sampling |
| Feature | Native | WanVideoWrapper |
|---|---|---|
| Simplicity | Simpler | More complex |
| Dual Hi-Lo | Manual two-pass | May handle internally |
| LoRA loading | Lora Loader Stack (rgthree) | WanVideoLoraSelect / WanVideoSetLoRAs |
| Caching (TeaCache) | Not available | Built-in |
| Context windows | Not available | WanVideoContextOptions |
| Block swap (VRAM) | Not available | WanVideoBlockSwap |
| VACE conditioning | Not available | Full VACE support |
| Long video (>81 frames) | Limited | InfiniteTalk / context windows |
Recommendation: Use Native dual hi-lo for standard FLF transitions. Use WanVideoWrapper when you need caching, context windows, VRAM management, or advanced conditioning.
| Aspect | Resolution | Megapixels |
|---|---|---|
| Portrait 2:3 | 480x720 | 0.35MP (recommended default) |
| Landscape 16:9 | 832x480 | 0.4MP |
| Portrait 9:16 | 480x832 | 0.4MP |
| Square | 640x640 | 0.4MP |
Width and height must be divisible by 16. Use ImageResizeKJv2 with divisible_by: 2 and keep_proportion: crop.
4n + 1 (1, 5, 9, ..., 49, 81, 121)Standard: 16 fps for WAN 2.2 output.
{
"class_type": "VHS_VideoCombine",
"inputs": {
"images": ["<vae_decode>", 0],
"frame_rate": 16,
"loop_count": 0,
"filename_prefix": "wan_flf",
"format": "video/h264-mp4",
"pingpong": false,
"save_output": true,
"pix_fmt": "yuv420p",
"crf": 19,
"save_metadata": true,
"trim_to_audio": false
}
}
ComfyUI manages VRAM by offloading models between passes. The Hi UNET is offloaded before the Lo UNET loads.
clear_vram before switching to WAN from another model familyVRAM_Debug node between generation and SeedVR2 upscaling to free all VRAMBy default, FLF produces a transition/dissolve between frames. For true morphing (one shape seamlessly reshaping into another), use a morph LoRA on both Hi and Lo paths.
| Variant | File | Strength | Notes |
|---|---|---|---|
| HighNoise | wan2.2_i2v_magical_morph_highnoise.safetensors | 0.7-1.0 | Apply to Hi Common stack |
| LowNoise | wan2.2_i2v_magical_morph_lownoise.safetensors | 0.7-1.0 | Apply to Lo Common stack |
For person-to-person morphs (identity, gender transforms):
Skin morphDescribe the transition motion, not just the start/end states:
Good: "A small cat sitting on the ground smoothly transforms and grows into a woman standing tall, seamless transformation, cinematic"
Bad: "A cat and a girl"
IMPORTANT — Prompt language affects visuals:
| Config | Lightning (Remix) | Standard |
|---|---|---|
| Models | Remix NSFW Hi+Lo fp16 | Official Hi+Lo fp8 |
| CLIP | nsfw_wan_umt5-xxl_bf16_fixed | umt5_xxl_fp8_e4m3fn_scaled |
| ModelSamplingSD3 shift | 5 | 8 |
| Total steps | 4 | 20 |
| Hi pass end_at_step | 2 | 10 |
| CFG | 1 | 4 |
| Sampler | uni_pc | euler |
| Scheduler | beta | simple |
| External LoRA needed | No (built-in) | Yes (paired hi/lo) |
When the start and end frames have different subject sizes (e.g., small cat → tall person), generate the "anchor" frame first — the one with the most complex composition — then use Qwen Edit to create the other frame from it. This ensures:
Example — Cat-to-Girl Morph:
Anti-pattern: Generating cat and girl independently produces mismatched scale.
upload_imageProven timing on RTX 4090: Z-Image (35s) → Qwen Edit (78s) → WAN FLF 81 frames (139s) = ~4 minutes total.
Use analyze_workflow to understand any saved WAN FLF workflow before modifying or executing it. It returns a structured summary with sections, node IDs, key settings, and virtual wire connections — no raw JSON needed.
analyze_workflow("Wan FirstLastFrame Advanced.json") # summary view (default)
analyze_workflow("Wan FirstLastFrame Advanced.json", view="flat") # mermaid diagram
Only use get_workflow when you need the raw JSON for enqueue_workflow or modify_workflow.