Builds ComfyUI workflows for Z-Image text-to-image using RedCraft checkpoint, Turbo/Base LoRAs, ControlNet, conditioning, and sampler presets.
From comfynpx claudepluginhub artokun/comfyui-mcp --plugin comfyThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
Z-Image is a 6B-parameter image generation model from Alibaba's Tongyi Lab using a Scalable Single-Stream DiT (S3-DiT) architecture. It uses a Qwen text encoder (not CLIP-L/T5), and the same ae.safetensors VAE as Flux. Two variants:
| Component | Node | Model | Notes |
|---|---|---|---|
| Checkpoint | CheckpointLoaderSimple | redcraftRedzimageUpdatedJAN30_redzibDX1.safetensors | 17GB, bundles UNET+CLIP+VAE |
RedCraft is a Z-Image Base finetune by the RedCraft team. Designed for faster inference than stock Z-Image Base. Uses CheckpointLoaderSimple since it's a combined checkpoint — no need for separate loaders.
| Component | Node | Model | Notes |
|---|---|---|---|
| UNET | UNETLoader | z_image_turbo_bf16.safetensors | Not currently installed |
| CLIP | CLIPLoader (type=qwen_image) | qwen_3_4b.safetensors | Not currently installed |
| VAE | VAELoader | ae.safetensors | Same as Flux VAE (320MB) |
| Component | Node | Model | Notes |
|---|---|---|---|
| UNET | UNETLoader | z_image_base_bf16.safetensors | Not currently installed |
| CLIP | CLIPLoader (type=qwen_image) | qwen_3_4b.safetensors | Not currently installed |
| VAE | VAELoader | ae.safetensors | Same as Flux VAE |
For Z-Image separate component loading. Supports reference images via CLIP Vision:
Required Inputs:
- clip: CLIP
- prompt: STRING (multiline)
- auto_resize_images: BOOLEAN (default true)
Optional Inputs:
- image_encoder: CLIP_VISION (for reference images)
- vae: VAE
- image1-3: IMAGE (up to 3 reference images)
Outputs:
[0] CONDITIONING
When using CheckpointLoaderSimple, standard CLIPTextEncode works since the checkpoint bundles the correct tokenizer:
{
"class_type": "CLIPTextEncode",
"inputs": { "clip": ["<checkpoint>", 1], "text": "<prompt>" }
}
| Preset | Steps | CFG | Sampler | Scheduler | Notes |
|---|---|---|---|---|---|
| Distilled Fast | 10 | 1.0 | euler | simple | Quick iteration |
| Standard | 30 | 4.0 | euler | simple | Full quality |
| Preset | Steps | CFG | Sampler | Scheduler | Notes |
|---|---|---|---|---|---|
| Author recommended | 14 | 1.0 | res_2s | simple | CopaxTimeless author pick |
| Beauty/fashion | 10 | 1.0 | euler_ancestral | beta | Smooth skin, fashion photography |
| Sharpest | 10 | 1.0 | dpmpp_sde | beta | Sharpest, most natural (560-image test) |
Stage 1 — Primary generation:
| Parameter | Value |
|---|---|
| Steps | 22 |
| CFG | 4.0 (range 4–7) |
| Sampler | res_2s |
| Scheduler | beta |
| Denoise | 1.0 |
Stage 2 — Detail refinement (optional img2img pass):
| Parameter | Value |
|---|---|
| Steps | 3 |
| CFG | 4.0 |
| Sampler | res_2s |
| Scheduler | normal |
| Denoise | 0.15 |
Supports negative prompts at CFG > 1.0:
3D, ai generated, semi realistic, illustrated, drawing, comic, digital painting, 3D model, blender, video game screenshot, screenshot, render, high-fidelity, smooth textures, CGI, masterpiece, text, writing, subtitle, watermark, logo, blurry, low quality, jpeg, artifacts, grainy
Negative prompts are not effective — CFG is baked in via distillation. Use the positive prompt to guide away from unwanted elements instead.
Recommended positive-side avoidance template:
over-smooth skin, plastic skin, doll face, anime, CGI, waxy texture, blurry face, fake pores, exaggerated makeup, over-sharpening, unrealistic symmetry, flat lighting, low detail skin, extra fingers, distorted anatomy
| Aspect | Resolution | Notes |
|---|---|---|
| Square | 1024x1024 | Standard |
| Square (native) | 1328x1328 | Higher quality at native resolution |
| Portrait 3:4 | 896x1152 | |
| Portrait 5:8 | 832x1216 | |
| Portrait 9:16 | 768x1344 | |
| Landscape 16:9 | 1280x720 |
Dimensions must be divisible by 16.
Located in loras/ZImageTurbo/ with subfolders:
style/ — Style LoRAs (e.g., TurboPussyZ_v2.safetensors)concept/ — Concept LoRAs (e.g., body from below.safetensors, ZITnsfwLoRA.safetensors)character/ — Character LoRAs (e.g., NSFW_master_ZIT_000008766.safetensors)action/ — Action LoRAsUse with Z-Image Turbo base model. Typical LoRA strength: 0.6–1.0.
Located in loras/ZImageBase/ with subfolders:
style/ — Style LoRAs (e.g., NSGIRL-Z-Image-LoRA-By-MM744.safetensors)concept/ — Concept LoRAsUse with Z-Image Base or RedCraft. Typical LoRA strength: 0.6–1.0.
General aesthetic improvement LoRA:
Z-Image-Aesthetic-Base v1.safetensors (352MB){
"class_type": "LoraLoader",
"inputs": {
"model": ["<checkpoint_or_unet>", 0],
"clip": ["<checkpoint_or_clip>", 1],
"lora_name": "ZImageTurbo\\style\\TurboPussyZ_v2.safetensors",
"strength_model": 0.8,
"strength_clip": 0.8
}
}
Note: When using CheckpointLoaderSimple for RedCraft, model output is index 0 and CLIP output is index 1. When stacking multiple LoRAs, chain them sequentially.
Experimental built-in node for Z-Image ControlNet. Patches the model with a control signal:
Required Inputs:
- model: MODEL
- model_patch: MODEL_PATCH (from ControlNet loader)
- vae: VAE
- strength: FLOAT (default 1.0, range -10 to 10)
Optional Inputs:
- image: IMAGE (reference/control image)
- inpaint_image: IMAGE
- mask: MASK
Outputs:
[0] MODEL (patched)
A unified ControlNet supporting multiple condition types:
res_2s, res_5s, or res_2m samplers + beta57 scheduler{
"1": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "redcraftRedzimageUpdatedJAN30_redzibDX1.safetensors" }},
"2": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["1", 1], "text": "<positive prompt>" }, "_meta": { "title": "Positive" }},
"3": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["1", 1], "text": "" }, "_meta": { "title": "Negative" }},
"4": { "class_type": "EmptyLatentImage", "inputs": { "width": 1024, "height": 1024, "batch_size": 1 }},
"5": { "class_type": "KSampler", "inputs": {
"model": ["1", 0],
"positive": ["2", 0],
"negative": ["3", 0],
"latent_image": ["4", 0],
"seed": 42, "steps": 10, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
}},
"6": { "class_type": "VAEDecode", "inputs": { "samples": ["5", 0], "vae": ["1", 2] }},
"7": { "class_type": "SaveImage", "inputs": { "images": ["6", 0], "filename_prefix": "redcraft" }}
}
{
"1": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "redcraftRedzimageUpdatedJAN30_redzibDX1.safetensors" }},
"2": { "class_type": "LoraLoader", "inputs": {
"model": ["1", 0], "clip": ["1", 1],
"lora_name": "Z-Image-Aesthetic-Base v1.safetensors",
"strength_model": 0.8, "strength_clip": 0.8
}},
"3": { "class_type": "LoraLoader", "inputs": {
"model": ["2", 0], "clip": ["2", 1],
"lora_name": "ZImageBase\\style\\NSGIRL-Z-Image-LoRA-By-MM744.safetensors",
"strength_model": 0.7, "strength_clip": 0.7
}},
"4": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["3", 1], "text": "<positive prompt>" }},
"5": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["3", 1], "text": "<negative prompt>" }},
"6": { "class_type": "EmptyLatentImage", "inputs": { "width": 896, "height": 1152, "batch_size": 1 }},
"7": { "class_type": "KSampler", "inputs": {
"model": ["3", 0],
"positive": ["4", 0],
"negative": ["5", 0],
"latent_image": ["6", 0],
"seed": 42, "steps": 30, "cfg": 4, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
}},
"8": { "class_type": "VAEDecode", "inputs": { "samples": ["7", 0], "vae": ["1", 2] }},
"9": { "class_type": "SaveImage", "inputs": { "images": ["8", 0], "filename_prefix": "redcraft_lora" }}
}
Natural language descriptions work best (uses Qwen LLM tokenizer, not CLIP):
Good: "Professional headshot of a confident businesswoman in her 30s, natural makeup, soft studio lighting, neutral gray background, sharp focus on eyes, Canon EOS R5"
Bad: "masterpiece, best quality, 1girl, businesswoman, studio"
| Config | VRAM | Notes |
|---|---|---|
| RedCraft DX1 checkpoint | ~17GB | Fits comfortably on RTX 4090 |
| Z-Image Turbo separate | ~8GB UNET + CLIP | Very lightweight |
| Z-Image Base separate | ~12GB |
clear_vram before switching to Z-Image from another model familydpmpp_sde + beta schedulerZ-Image-Aesthetic-Base v1 LoRA at 0.6–0.8 strength noticeably improves output quality across all Z-Image Base variants