Skill

qwen-txt2img

Guides ComfyUI workflows for Qwen Image 2512 text-to-image using integrated KSampler, separate loaders, lightning LoRAs, fine-tuned models, and sampler presets.

Python

Hugging Face

ai-ml

From comfy

Install

Run in your terminal

npx claudepluginhub artokun/comfyui-mcp --plugin comfy

Tool Access

This skill uses the workspace's default tool permissions.

Skill Content

Similar Skills

agent-harness-construction

Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.

ecc

140.3k

agent-payment-x402

Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.

ecc

140.3k

agent-eval

Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.

ecc

140.3k

Stats

Parent Repo Stars24

Parent Repo Forks7

Last CommitFeb 17, 2026

Actions

View Source View Plugin View on GitHub View README

Qwen Image 2512 Text-to-Image Workflows

Overview

Qwen Image 2512 is the latest (December 2025) text-to-image model from the Qwen family. It uses a vision-language model (Qwen2.5-VL) as the text encoder and generates high-quality images from natural language prompts. Two workflow approaches:

QwenImageIntegratedKSampler — All-in-one node (recommended for simplicity)
Separate component loading — UNETLoader + CLIPLoader + VAELoader + standard KSampler (more flexible)

Models

Standard Components

Component	Node	Model	Notes
UNET	`UNETLoader`	`qwen_image_2512_fp8_e4m3fn.safetensors`	FP8, not currently installed — download if needed
CLIP	`CLIPLoader` (type=`qwen_image`)	`qwen_2.5_vl_7b_fp8_scaled.safetensors`	Shared across all Qwen models, in clip/
VAE	`VAELoader`	`qwen_image_vae.safetensors`	Qwen-specific VAE (242MB)

Fine-tuned Variants (Installed)

Model	Path	Focus
`qwenImageEditRemix_v10`	`diffusion_models/qwenImageEditRemix_v10.safetensors`	General-purpose remix
`qwenUltimateRealism_v11`	UNETLoader path	Product photography, hyper-realistic
`copaxTimeless`	UNETLoader path	Ultra-realistic portraits
`qwnImageEdit_v16Bf16`	UNETLoader path	Abliterated (uncensored)

Lightning LoRAs

4-Step Lightning (General Qwen / txt2img)

{
  "class_type": "LoraLoaderModelOnly",
  "inputs": {
    "model": ["<unet_node>", 0],
    "lora_name": "Qwen-Image-Lightning-4steps-V1.0.safetensors",
    "strength_model": 1.0
  }
}

Settings: steps=4, cfg=1.0, sampler=euler, scheduler=simple, denoise=1.0

8-Step Lightning (Higher Quality)

{
  "class_type": "LoraLoaderModelOnly",
  "inputs": {
    "model": ["<unet_node>", 0],
    "lora_name": "Qwen-Image-Lightning-8steps-V1.0.safetensors",
    "strength_model": 1.0
  }
}

Settings: steps=8, cfg=1.0 (or 2.5 for character detail), sampler=euler, scheduler=simple

Sampler Settings

Preset	Steps	CFG	Sampler	Scheduler	Denoise	LoRA	Notes
Lightning 4-step	4	1.0	euler	simple	1.0	Lightning-4steps	Fastest, good quality
Lightning 8-step	8	1.0	euler	simple	1.0	Lightning-8steps	Better detail
Lightning character	8	2.5	euler	simple	1.0	Lightning-8steps	Best for portraits
Standard	50	4.0	euler	simple	1.0	none	Official ComfyUI
Golden quality	50	4.5	euler	simple	1.0	none	Community best
Character composition	30	4.0	euler_ancestral	beta	1.0	none	Multi-character scenes
CopaxTimeless	30	4.0	res_multistep	sgm_uniform	1.0	none	Ultra-realistic
UltimateRealism	30	7.5	euler	simple	1.0	none	Product photography

ModelSamplingAuraFlow

For standard (non-lightning) presets, apply flow matching shift:

{
  "class_type": "ModelSamplingAuraFlow",
  "inputs": { "model": ["<unet_or_lora>", 0], "shift": 3.1 }
}

Shift=3.1 is the standard value for Qwen Image. Not needed with lightning LoRA (baked into the distillation).

Resolutions

Qwen operates at ~1.6 megapixels natively:

Aspect	Resolution	Use Case
Square	1328x1328	General
Portrait 3:4	1104x1472	Portraits
Portrait 2:3	1056x1584
Portrait 9:16	928x1664	Phone format
Landscape 4:3	1472x1104	Landscape scenes
Landscape 3:2	1584x1056
Landscape 16:9	1664x928	Widescreen
Ultra portrait	1536x2048	Tall format
Video-ready	832x480	For WAN 2.2 FLF pipeline

Approach 1: QwenImageIntegratedKSampler (All-in-One)

The QwenImageIntegratedKSampler custom node handles model patching, conditioning, sampling, and output in a single node. Simplest workflow — just 4 nodes for model loading + 1 integrated sampler + 1 save.

Node Inputs

Required:
  - model: MODEL (from UNETLoader)
  - clip: CLIP (from CLIPLoader, type=qwen_image)
  - vae: VAE
  - positive_prompt: STRING
  - negative_prompt: STRING
  - generation_mode: "文生图 text-to-image" or "图生图 image-to-image"
  - batch_size: INT (default 1)
  - width: INT (default 0, step 8)
  - height: INT (default 0, step 8)
  - seed: INT
  - steps: INT (default 4)
  - cfg: FLOAT (default 1)
  - sampler_name: euler, dpmpp_2m, etc.
  - scheduler: simple, sgm_uniform, beta, etc.
  - denoise: FLOAT (default 1)

Optional:
  - image1-5: IMAGE (reference images for i2i or multi-ref)
  - latent: LATENT
  - controlnet_data: CONTROL_NET_DATA
  - auraflow_shift: FLOAT (default 3)
  - cfg_norm_strength: FLOAT (default 1)

Outputs:
  [0] IMAGE — generated image
  [1] LATENT — output latent (optional)
  [2] IMAGE — scaled input image (for i2i)

Complete Workflow: Integrated Sampler (Lightning 4-Step)

{
  "1": { "class_type": "UNETLoader", "inputs": { "unet_name": "qwenImageEditRemix_v10.safetensors", "weight_dtype": "default" }},
  "2": { "class_type": "LoraLoaderModelOnly", "inputs": { "model": ["1", 0], "lora_name": "Qwen-Image-Lightning-4steps-V1.0.safetensors", "strength_model": 1.0 }},
  "3": { "class_type": "CLIPLoader", "inputs": { "clip_name": "qwen_2.5_vl_7b_fp8_scaled.safetensors", "type": "qwen_image" }},
  "4": { "class_type": "VAELoader", "inputs": { "vae_name": "qwen_image_vae.safetensors" }},
  "5": { "class_type": "QwenImageIntegratedKSampler", "inputs": {
    "model": ["2", 0],
    "clip": ["3", 0],
    "vae": ["4", 0],
    "positive_prompt": "<detailed natural language prompt>",
    "negative_prompt": "",
    "generation_mode": "文生图 text-to-image",
    "batch_size": 1,
    "width": 1024,
    "height": 1344,
    "seed": 42,
    "steps": 4,
    "cfg": 1,
    "sampler_name": "euler",
    "scheduler": "simple",
    "denoise": 1,
    "auraflow_shift": 3,
    "cfg_norm_strength": 1
  }},
  "6": { "class_type": "SaveImage", "inputs": { "images": ["5", 0], "filename_prefix": "qwen_t2i" }}
}

Approach 2: Separate Component Loading (Standard Pipeline)

More flexible — allows inserting additional processing nodes between stages.

Pipeline Flow

UNETLoader → [LoraLoaderModelOnly] → [ModelSamplingAuraFlow (shift=3.1)] → MODEL
CLIPLoader (qwen_image) → CLIP
VAELoader → VAE

CLIPTextEncode (positive) → CONDITIONING
ConditioningZeroOut → negative CONDITIONING

EmptyLatentImage (1024x1344) → LATENT

KSampler → VAEDecode → SaveImage

Complete Workflow: Separate Loading (Lightning 4-Step)

{
  "1": { "class_type": "UNETLoader", "inputs": { "unet_name": "qwenImageEditRemix_v10.safetensors", "weight_dtype": "default" }},
  "2": { "class_type": "LoraLoaderModelOnly", "inputs": { "model": ["1", 0], "lora_name": "Qwen-Image-Lightning-4steps-V1.0.safetensors", "strength_model": 1.0 }},
  "3": { "class_type": "CLIPLoader", "inputs": { "clip_name": "qwen_2.5_vl_7b_fp8_scaled.safetensors", "type": "qwen_image" }},
  "4": { "class_type": "VAELoader", "inputs": { "vae_name": "qwen_image_vae.safetensors" }},
  "5": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["3", 0], "text": "<detailed natural language prompt>" }},
  "6": { "class_type": "ConditioningZeroOut", "inputs": { "conditioning": ["5", 0] }},
  "7": { "class_type": "EmptyLatentImage", "inputs": { "width": 1024, "height": 1344, "batch_size": 1 }},
  "8": { "class_type": "KSampler", "inputs": {
    "model": ["2", 0],
    "positive": ["5", 0],
    "negative": ["6", 0],
    "latent_image": ["7", 0],
    "seed": 42, "steps": 4, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
  }},
  "9": { "class_type": "VAEDecode", "inputs": { "samples": ["8", 0], "vae": ["4", 0] }},
  "10": { "class_type": "SaveImage", "inputs": { "images": ["9", 0], "filename_prefix": "qwen_t2i" }}
}

Complete Workflow: Standard Quality (50-Step)

{
  "1": { "class_type": "UNETLoader", "inputs": { "unet_name": "qwenImageEditRemix_v10.safetensors", "weight_dtype": "default" }},
  "2": { "class_type": "ModelSamplingAuraFlow", "inputs": { "model": ["1", 0], "shift": 3.1 }},
  "3": { "class_type": "CLIPLoader", "inputs": { "clip_name": "qwen_2.5_vl_7b_fp8_scaled.safetensors", "type": "qwen_image" }},
  "4": { "class_type": "VAELoader", "inputs": { "vae_name": "qwen_image_vae.safetensors" }},
  "5": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["3", 0], "text": "<detailed natural language prompt>" }},
  "6": { "class_type": "ConditioningZeroOut", "inputs": { "conditioning": ["5", 0] }},
  "7": { "class_type": "EmptyLatentImage", "inputs": { "width": 1328, "height": 1328, "batch_size": 1 }},
  "8": { "class_type": "KSampler", "inputs": {
    "model": ["2", 0],
    "positive": ["5", 0],
    "negative": ["6", 0],
    "latent_image": ["7", 0],
    "seed": 42, "steps": 50, "cfg": 4, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
  }},
  "9": { "class_type": "VAEDecode", "inputs": { "samples": ["8", 0], "vae": ["4", 0] }},
  "10": { "class_type": "SaveImage", "inputs": { "images": ["9", 0], "filename_prefix": "qwen_t2i_hq" }}
}

Negative Conditioning

Always use ConditioningZeroOut for Qwen txt2img:

{
  "class_type": "ConditioningZeroOut",
  "inputs": { "conditioning": ["<positive_cond>", 0] }
}

Or use an empty string in CLIPTextEncode — but ZeroOut is more explicit and reliable.

QwenImageDiffsynthControlnet

For ControlNet support with Qwen models. Patches the model with a DiffSynth control signal:

Required Inputs:
  - model: MODEL
  - model_patch: MODEL_PATCH (from DiffSynth ControlNet loader)
  - vae: VAE
  - image: IMAGE (control image)
  - strength: FLOAT (default 1.0)

Optional:
  - mask: MASK

Outputs:
  [0] MODEL (patched)

DiffSynth ControlNets support: canny, depth, inpaint only (NOT pose).

Concept/Style LoRAs (Installed)

Located in loras/Qwen/:

style/ — Figure makers, reality transform, panel painter
concept/ — Various concept LoRAs
poses/ — Pose-specific LoRAs
character/ — Character enhancement
anime/ — Anime style LoRAs
tool/ — Utility LoRAs (anything2real, gaussian splash)
equirectangular projection/ — 360 panorama LoRA

Apply with LoraLoaderModelOnly:

{
  "class_type": "LoraLoaderModelOnly",
  "inputs": {
    "model": ["<unet_or_lightning_lora>", 0],
    "lora_name": "Qwen\\concept\\hinaQwenImageAsianMixLora_v2.safetensors",
    "strength_model": 0.8
  }
}

Prompt Style

Natural language, 1–3 sentences. Be descriptive:

Good: "Professional portrait of an Asian woman in her late 20s, wearing a cream linen blazer at a Tokyo rooftop café during golden hour, holding a matcha latte, editorial fashion photography, shot on Sony A7III 85mm f/1.4"
Bad: "1girl, cafe, blazer, matcha"

Tips:

Put text to render in quotes within the prompt
"photograph" works better than "photorealistic"
Negative prompts: use NLP-style descriptions, not keyword spam (or just use ZeroOut)

VRAM Considerations

Config	VRAM	Notes
FP8 UNET + fp8 CLIP + VAE	~17-18GB	Fits comfortably on RTX 4090
bf16 UNET (edit model)	~10GB UNET + 7GB CLIP	Also fits well

Always clear_vram before switching to Qwen from another model family
Lightning 4-step is extremely fast (~3-5s per image)

Tips

QwenImageIntegratedKSampler is the simplest approach for basic txt2img — one node handles everything
For LoRA stacking or ControlNet, use the separate component pipeline instead
The integrated sampler's auraflow_shift defaults to 3 (close to the recommended 3.1) — adjust only if needed
For video pipeline output (feeding into WAN FLF), set resolution to 832x480
CopaxTimeless pick: res_multistep + sgm_uniform at CFG 4.0 for ultra-realistic results
Multiple concept LoRAs can stack — reduce individual strength to 0.5-0.7 when combining