Guides ComfyUI workflows for Qwen Image 2512 text-to-image using integrated KSampler, separate loaders, lightning LoRAs, fine-tuned models, and sampler presets.
From comfynpx claudepluginhub artokun/comfyui-mcp --plugin comfyThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Qwen Image 2512 is the latest (December 2025) text-to-image model from the Qwen family. It uses a vision-language model (Qwen2.5-VL) as the text encoder and generates high-quality images from natural language prompts. Two workflow approaches:
| Component | Node | Model | Notes |
|---|---|---|---|
| UNET | UNETLoader | qwen_image_2512_fp8_e4m3fn.safetensors | FP8, not currently installed — download if needed |
| CLIP | CLIPLoader (type=qwen_image) | qwen_2.5_vl_7b_fp8_scaled.safetensors | Shared across all Qwen models, in clip/ |
| VAE | VAELoader | qwen_image_vae.safetensors | Qwen-specific VAE (242MB) |
| Model | Path | Focus |
|---|---|---|
qwenImageEditRemix_v10 | diffusion_models/qwenImageEditRemix_v10.safetensors | General-purpose remix |
qwenUltimateRealism_v11 | UNETLoader path | Product photography, hyper-realistic |
copaxTimeless | UNETLoader path | Ultra-realistic portraits |
qwnImageEdit_v16Bf16 | UNETLoader path | Abliterated (uncensored) |
{
"class_type": "LoraLoaderModelOnly",
"inputs": {
"model": ["<unet_node>", 0],
"lora_name": "Qwen-Image-Lightning-4steps-V1.0.safetensors",
"strength_model": 1.0
}
}
Settings: steps=4, cfg=1.0, sampler=euler, scheduler=simple, denoise=1.0
{
"class_type": "LoraLoaderModelOnly",
"inputs": {
"model": ["<unet_node>", 0],
"lora_name": "Qwen-Image-Lightning-8steps-V1.0.safetensors",
"strength_model": 1.0
}
}
Settings: steps=8, cfg=1.0 (or 2.5 for character detail), sampler=euler, scheduler=simple
| Preset | Steps | CFG | Sampler | Scheduler | Denoise | LoRA | Notes |
|---|---|---|---|---|---|---|---|
| Lightning 4-step | 4 | 1.0 | euler | simple | 1.0 | Lightning-4steps | Fastest, good quality |
| Lightning 8-step | 8 | 1.0 | euler | simple | 1.0 | Lightning-8steps | Better detail |
| Lightning character | 8 | 2.5 | euler | simple | 1.0 | Lightning-8steps | Best for portraits |
| Standard | 50 | 4.0 | euler | simple | 1.0 | none | Official ComfyUI |
| Golden quality | 50 | 4.5 | euler | simple | 1.0 | none | Community best |
| Character composition | 30 | 4.0 | euler_ancestral | beta | 1.0 | none | Multi-character scenes |
| CopaxTimeless | 30 | 4.0 | res_multistep | sgm_uniform | 1.0 | none | Ultra-realistic |
| UltimateRealism | 30 | 7.5 | euler | simple | 1.0 | none | Product photography |
For standard (non-lightning) presets, apply flow matching shift:
{
"class_type": "ModelSamplingAuraFlow",
"inputs": { "model": ["<unet_or_lora>", 0], "shift": 3.1 }
}
Shift=3.1 is the standard value for Qwen Image. Not needed with lightning LoRA (baked into the distillation).
Qwen operates at ~1.6 megapixels natively:
| Aspect | Resolution | Use Case |
|---|---|---|
| Square | 1328x1328 | General |
| Portrait 3:4 | 1104x1472 | Portraits |
| Portrait 2:3 | 1056x1584 | |
| Portrait 9:16 | 928x1664 | Phone format |
| Landscape 4:3 | 1472x1104 | Landscape scenes |
| Landscape 3:2 | 1584x1056 | |
| Landscape 16:9 | 1664x928 | Widescreen |
| Ultra portrait | 1536x2048 | Tall format |
| Video-ready | 832x480 | For WAN 2.2 FLF pipeline |
The QwenImageIntegratedKSampler custom node handles model patching, conditioning, sampling, and output in a single node. Simplest workflow — just 4 nodes for model loading + 1 integrated sampler + 1 save.
Required:
- model: MODEL (from UNETLoader)
- clip: CLIP (from CLIPLoader, type=qwen_image)
- vae: VAE
- positive_prompt: STRING
- negative_prompt: STRING
- generation_mode: "文生图 text-to-image" or "图生图 image-to-image"
- batch_size: INT (default 1)
- width: INT (default 0, step 8)
- height: INT (default 0, step 8)
- seed: INT
- steps: INT (default 4)
- cfg: FLOAT (default 1)
- sampler_name: euler, dpmpp_2m, etc.
- scheduler: simple, sgm_uniform, beta, etc.
- denoise: FLOAT (default 1)
Optional:
- image1-5: IMAGE (reference images for i2i or multi-ref)
- latent: LATENT
- controlnet_data: CONTROL_NET_DATA
- auraflow_shift: FLOAT (default 3)
- cfg_norm_strength: FLOAT (default 1)
Outputs:
[0] IMAGE — generated image
[1] LATENT — output latent (optional)
[2] IMAGE — scaled input image (for i2i)
{
"1": { "class_type": "UNETLoader", "inputs": { "unet_name": "qwenImageEditRemix_v10.safetensors", "weight_dtype": "default" }},
"2": { "class_type": "LoraLoaderModelOnly", "inputs": { "model": ["1", 0], "lora_name": "Qwen-Image-Lightning-4steps-V1.0.safetensors", "strength_model": 1.0 }},
"3": { "class_type": "CLIPLoader", "inputs": { "clip_name": "qwen_2.5_vl_7b_fp8_scaled.safetensors", "type": "qwen_image" }},
"4": { "class_type": "VAELoader", "inputs": { "vae_name": "qwen_image_vae.safetensors" }},
"5": { "class_type": "QwenImageIntegratedKSampler", "inputs": {
"model": ["2", 0],
"clip": ["3", 0],
"vae": ["4", 0],
"positive_prompt": "<detailed natural language prompt>",
"negative_prompt": "",
"generation_mode": "文生图 text-to-image",
"batch_size": 1,
"width": 1024,
"height": 1344,
"seed": 42,
"steps": 4,
"cfg": 1,
"sampler_name": "euler",
"scheduler": "simple",
"denoise": 1,
"auraflow_shift": 3,
"cfg_norm_strength": 1
}},
"6": { "class_type": "SaveImage", "inputs": { "images": ["5", 0], "filename_prefix": "qwen_t2i" }}
}
More flexible — allows inserting additional processing nodes between stages.
UNETLoader → [LoraLoaderModelOnly] → [ModelSamplingAuraFlow (shift=3.1)] → MODEL
CLIPLoader (qwen_image) → CLIP
VAELoader → VAE
CLIPTextEncode (positive) → CONDITIONING
ConditioningZeroOut → negative CONDITIONING
EmptyLatentImage (1024x1344) → LATENT
KSampler → VAEDecode → SaveImage
{
"1": { "class_type": "UNETLoader", "inputs": { "unet_name": "qwenImageEditRemix_v10.safetensors", "weight_dtype": "default" }},
"2": { "class_type": "LoraLoaderModelOnly", "inputs": { "model": ["1", 0], "lora_name": "Qwen-Image-Lightning-4steps-V1.0.safetensors", "strength_model": 1.0 }},
"3": { "class_type": "CLIPLoader", "inputs": { "clip_name": "qwen_2.5_vl_7b_fp8_scaled.safetensors", "type": "qwen_image" }},
"4": { "class_type": "VAELoader", "inputs": { "vae_name": "qwen_image_vae.safetensors" }},
"5": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["3", 0], "text": "<detailed natural language prompt>" }},
"6": { "class_type": "ConditioningZeroOut", "inputs": { "conditioning": ["5", 0] }},
"7": { "class_type": "EmptyLatentImage", "inputs": { "width": 1024, "height": 1344, "batch_size": 1 }},
"8": { "class_type": "KSampler", "inputs": {
"model": ["2", 0],
"positive": ["5", 0],
"negative": ["6", 0],
"latent_image": ["7", 0],
"seed": 42, "steps": 4, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
}},
"9": { "class_type": "VAEDecode", "inputs": { "samples": ["8", 0], "vae": ["4", 0] }},
"10": { "class_type": "SaveImage", "inputs": { "images": ["9", 0], "filename_prefix": "qwen_t2i" }}
}
{
"1": { "class_type": "UNETLoader", "inputs": { "unet_name": "qwenImageEditRemix_v10.safetensors", "weight_dtype": "default" }},
"2": { "class_type": "ModelSamplingAuraFlow", "inputs": { "model": ["1", 0], "shift": 3.1 }},
"3": { "class_type": "CLIPLoader", "inputs": { "clip_name": "qwen_2.5_vl_7b_fp8_scaled.safetensors", "type": "qwen_image" }},
"4": { "class_type": "VAELoader", "inputs": { "vae_name": "qwen_image_vae.safetensors" }},
"5": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["3", 0], "text": "<detailed natural language prompt>" }},
"6": { "class_type": "ConditioningZeroOut", "inputs": { "conditioning": ["5", 0] }},
"7": { "class_type": "EmptyLatentImage", "inputs": { "width": 1328, "height": 1328, "batch_size": 1 }},
"8": { "class_type": "KSampler", "inputs": {
"model": ["2", 0],
"positive": ["5", 0],
"negative": ["6", 0],
"latent_image": ["7", 0],
"seed": 42, "steps": 50, "cfg": 4, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
}},
"9": { "class_type": "VAEDecode", "inputs": { "samples": ["8", 0], "vae": ["4", 0] }},
"10": { "class_type": "SaveImage", "inputs": { "images": ["9", 0], "filename_prefix": "qwen_t2i_hq" }}
}
Always use ConditioningZeroOut for Qwen txt2img:
{
"class_type": "ConditioningZeroOut",
"inputs": { "conditioning": ["<positive_cond>", 0] }
}
Or use an empty string in CLIPTextEncode — but ZeroOut is more explicit and reliable.
For ControlNet support with Qwen models. Patches the model with a DiffSynth control signal:
Required Inputs:
- model: MODEL
- model_patch: MODEL_PATCH (from DiffSynth ControlNet loader)
- vae: VAE
- image: IMAGE (control image)
- strength: FLOAT (default 1.0)
Optional:
- mask: MASK
Outputs:
[0] MODEL (patched)
DiffSynth ControlNets support: canny, depth, inpaint only (NOT pose).
Located in loras/Qwen/:
style/ — Figure makers, reality transform, panel painterconcept/ — Various concept LoRAsposes/ — Pose-specific LoRAscharacter/ — Character enhancementanime/ — Anime style LoRAstool/ — Utility LoRAs (anything2real, gaussian splash)equirectangular projection/ — 360 panorama LoRAApply with LoraLoaderModelOnly:
{
"class_type": "LoraLoaderModelOnly",
"inputs": {
"model": ["<unet_or_lightning_lora>", 0],
"lora_name": "Qwen\\concept\\hinaQwenImageAsianMixLora_v2.safetensors",
"strength_model": 0.8
}
}
Natural language, 1–3 sentences. Be descriptive:
Good: "Professional portrait of an Asian woman in her late 20s, wearing a cream linen blazer at a Tokyo rooftop café during golden hour, holding a matcha latte, editorial fashion photography, shot on Sony A7III 85mm f/1.4"
Bad: "1girl, cafe, blazer, matcha"
Tips:
| Config | VRAM | Notes |
|---|---|---|
| FP8 UNET + fp8 CLIP + VAE | ~17-18GB | Fits comfortably on RTX 4090 |
| bf16 UNET (edit model) | ~10GB UNET + 7GB CLIP | Also fits well |
clear_vram before switching to Qwen from another model familyauraflow_shift defaults to 3 (close to the recommended 3.1) — adjust only if needed