Help us improve
Share bugs, ideas, or general feedback.
From comfy
Orchestrates story-to-video pipeline: breaks text into scenes, generates consistent Z-Image hero/refs + Qwen Edit frames, WAN FLF clips, ffmpeg concatenation.
npx claudepluginhub artokun/comfyui-mcp --plugin comfyHow this skill is triggered — by the user, by Claude, or both
Slash command
/comfy:directorThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
The Director skill orchestrates a complete short film production from a text story. It breaks the story into scenes, generates start/end frames for each, creates video clips from frame pairs, and concatenates everything into a final video.
Acts as AI creative director for video production including product ads, short films, montages, TikTok e-commerce. Analyzes inputs, writes English prompts, generates assets, submits tasks.
Converts story ideas into AI video storyboard prompts and asset image prompts. Supports new stories and continuations. Use /series-video to start.
Orchestrates multi-clip AI video projects using style anchors for visual consistency, chaining patterns, frame-level QA, and montage assembly. For cinematic AI video pipelines.
Share bugs, ideas, or general feedback.
The Director skill orchestrates a complete short film production from a text story. It breaks the story into scenes, generates start/end frames for each, creates video clips from frame pairs, and concatenates everything into a final video.
Pipeline: Story Planning → Z-Image Hero + Character Refs → Qwen Edit Chain (all frames) → WAN 2.2 FLF Video Clips → ffmpeg Concatenation
Key architectural decisions:
clear_vram between every model family switchIndependent Z-Image generations per scene produce different-looking characters. This was the #1 problem discovered during testing. The solution:
Phase 1: Story Planning → Break story into scenes (Claude reasoning, no ComfyUI)
Phase 2: Hero + Refs → Z-Image: 1 hero frame + character ref portraits + background ref
Phase 3: Hero Review → Visual verify hero and refs, user approves
Phase 4: Edit Chain → Qwen Edit: chain ALL scene frames from hero (with char refs in slots 2-3)
Phase 5: Frame Review → Visual verify all frames, approve/reject/retry
Phase 6: Video Clips → WAN 2.2 FLF dual Hi-Lo (one clip per scene)
Phase 7: Video Review → Preview each clip
Phase 8: Final Assembly → ffmpeg concat all clips into one MP4
Saved at ~/code/comfyui-mcp/workflows/director_state_{project_id}.json. Updated after every edit or phase completion.
{
"project_id": "story_20260216_143022",
"created": "2026-02-16T14:30:22Z",
"story": "Original user story text",
"current_phase": 4,
"orientation": "portrait",
"hero_frame": { "file": "director_hero_00001_.png", "seed": 428571, "approved": true },
"character_refs": {
"man": "director_ref_man.png",
"cat": "director_ref_cat.png",
"woman": "director_ref_woman.png",
"background": "director_ref_bedroom.png"
},
"scenes": [
{
"id": 1,
"description": "Brief scene description",
"edit_prompt_start": "Qwen Edit instruction to create start frame from source",
"edit_prompt_end": "Qwen Edit instruction to create end frame from source",
"edit_source_start": "hero",
"edit_source_end": "hero",
"video_prompt": "WAN motion description",
"start_frame": { "file": "director_s1_start_00001_.png", "seed": 12345, "approved": true },
"end_frame": { "file": "director_hero_00001_.png", "seed": null, "approved": true },
"video_clip": { "file": "director_s1_00001.mp4", "seed": 11111, "approved": false },
"status": "video_pending"
}
],
"final_video": null,
"settings": {
"start_frame_resolution": [832, 1472],
"video_resolution": [480, 720],
"video_frames": 81,
"video_fps": 16
}
}
| Phase | Model Family | Key Models | VRAM |
|---|---|---|---|
| 2: Hero + Refs | Z-Image | redcraftRedzimageUpdatedJAN30_redzibDX1.safetensors | ~17GB |
| 4: Edit Chain | Qwen Edit | qwen_image_edit_2511_bf16.safetensors + Lightning LoRA | ~17-18GB |
| 6: Video Clips | WAN 2.2 I2V | Remix NSFW Hi+Lo (built-in lightning) | ~22-24GB |
CRITICAL: clear_vram between every model family switch.
Break the story into 2-6 scenes. For each scene, identify:
Identify a hero frame — the single most representative scene image that establishes the main character and setting. This hero will anchor all other frames via Qwen Edit.
Also identify which character reference images are needed (portraits of each character, key props, background).
The end frame of Scene N must be the EXACT same image file as the start frame of Scene N+1. Do NOT create separate Qwen-edited start frames for subsequent scenes — this causes visible jumps at scene boundaries when the videos are concatenated.
The frame chain for video generation:
Scene 1: S1_start (unique) → hero (end)
Scene 2: hero (= S1 end) → S2_end
Scene 3: S2_end (= S2 end) → S3_end
Scene 4: S3_end (= S3 end) → S4_end
Scene 5: S4_end (= S4 end) → S5_end
Only Scene 1 needs a unique start frame. All other scenes inherit their start from the previous scene's end.
The edit chain produces only end frames (plus Scene 1's unique start frame). Map which end frame derives from which source:
Example chain:
Hero (man+cat on bed)
├─ S1 Start: edit hero → remove cat, man alone
├─ S2 End: edit hero → replace cat with woman
│ └─ S3 End: edit S2End → both sit up, man startled
│ └─ S4 End: edit S3End → sitting close, warm smiles
│ └─ S5 End: edit S4End → warm embrace
Generate with Z-Image RedCraft DX1 (10 steps, CFG 1, euler/simple):
Add to negative prompts for character refs to exclude wrong subjects (e.g., "woman, female" when generating man portrait).
{
"1": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "redcraftRedzimageUpdatedJAN30_redzibDX1.safetensors" }},
"2": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["1", 1], "text": "<hero_prompt>" }, "_meta": { "title": "Positive" }},
"3": { "class_type": "CLIPTextEncode", "inputs": { "clip": ["1", 1], "text": "3D, ai generated, semi realistic, illustrated, drawing, comic, digital painting, 3D model, blender, video game screenshot, render, smooth textures, CGI, text, writing, subtitle, watermark, logo, blurry, low quality, jpeg artifacts, grainy" }, "_meta": { "title": "Negative" }},
"4": { "class_type": "EmptyLatentImage", "inputs": { "width": 832, "height": 1472, "batch_size": 1 }},
"5": { "class_type": "KSampler", "inputs": {
"model": ["1", 0], "positive": ["2", 0], "negative": ["3", 0], "latent_image": ["4", 0],
"seed": 42, "steps": 10, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
}},
"6": { "class_type": "VAEDecode", "inputs": { "samples": ["5", 0], "vae": ["1", 2] }},
"7": { "class_type": "SaveImage", "inputs": { "images": ["6", 0], "filename_prefix": "director_hero" }}
}
Queue hero + all refs while Z-Image checkpoint is loaded (same checkpoint, different prompts).
Show hero frame and all character refs. User approves or requests regeneration with new seed.
{
"1": { "class_type": "UNETLoader", "inputs": { "unet_name": "qwen_image_edit_2511_bf16.safetensors", "weight_dtype": "default" }},
"2": { "class_type": "LoraLoaderModelOnly", "inputs": { "model": ["1", 0], "lora_name": "Qwen-Image-Edit-2511-Lightning-4steps-V1.0-bf16.safetensors", "strength_model": 1 }},
"3": { "class_type": "CLIPLoader", "inputs": { "clip_name": "qwen_2.5_vl_7b_fp8_scaled.safetensors", "type": "qwen_image" }},
"4": { "class_type": "VAELoader", "inputs": { "vae_name": "qwen_image_vae.safetensors" }},
"5": { "class_type": "LoadImage", "inputs": { "image": "<source_scene.png>" }, "_meta": { "title": "Source Scene" }},
"5b": { "class_type": "LoadImage", "inputs": { "image": "<character_ref.png>" }, "_meta": { "title": "Character Ref" }},
"5c": { "class_type": "LoadImage", "inputs": { "image": "<background_ref.png>" }, "_meta": { "title": "Background Ref" }},
"6": { "class_type": "TextEncodeQwenImageEditPlusAdvance_lrzjason", "inputs": {
"clip": ["3", 0], "prompt": "<edit_prompt>", "vae": ["4", 0],
"vl_resize_image1": ["5", 0],
"vl_resize_image2": ["5b", 0],
"vl_resize_image3": ["5c", 0],
"target_size": 1024, "target_vl_size": 384,
"upscale_method": "lanczos", "crop_method": "pad"
}},
"7": { "class_type": "ConditioningZeroOut", "inputs": { "conditioning": ["6", 0] }},
"8": { "class_type": "KSampler", "inputs": {
"model": ["2", 0], "positive": ["6", 0], "negative": ["7", 0], "latent_image": ["6", 1],
"seed": 42, "steps": 4, "cfg": 1, "sampler_name": "euler", "scheduler": "simple", "denoise": 1
}},
"9": { "class_type": "VAEDecode", "inputs": { "samples": ["8", 0], "vae": ["4", 0] }},
"10": { "class_type": "SaveImage", "inputs": { "images": ["9", 0], "filename_prefix": "director_s1_start" }}
}
Key: slots 5b and 5c — feed character reference and background reference into vl_resize_image2 and vl_resize_image3. This helps the vision encoder maintain character appearance across edits.
Edits are sequential — each depends on the previous output:
upload_image the outputIndependent edits (both from hero) can run in parallel.
For each frame, show via Read for visual inspection. User approves or provides feedback. Re-run individual edits without redoing the whole chain.
(Same as wan-flf-video skill — Remix NSFW Hi+Lo, 4-stack LoRA, ImageResizeKJv2, dual KSamplerAdvanced)
Key settings:
For transformation scenes (e.g., cat→woman), add morph LoRA to Hi/Lo Common stacks:
wan2.2_i2v_magical_morph_highnoise.safetensors → Hi Common slot 1 (strength 1.0)wan2.2_i2v_magical_morph_lownoise.safetensors → Lo Common slot 1 (strength 1.0)Use 1.0 strength — tested without sparkle issues. Lower values (0.7-0.85) produce weaker morph effects that may look like a dissolve rather than a true morph.
Swap per scene: start/end image filenames, positive prompt text, noise_seed, filename_prefix.
All 5 clips can be queued at once — they run sequentially in ComfyUI, sharing loaded models.
Report each clip's filename. User previews externally.
cd "<ComfyUI_output_dir>"
printf "file 'director_s1_00001.mp4'\nfile 'director_s2_00001.mp4'\n..." > concat_list.txt
ffmpeg -f concat -safe 0 -i concat_list.txt -c copy director_final_{project_id}.mp4
All clips share resolution/codec/framerate — copy-concat works without re-encoding.
After context compaction:
director_session_notes.md if it existscurrent_phase and per-scene statusclear_vram before loading the model family for the current phase| Phase | Per Scene | 5 Scenes |
|---|---|---|
| Hero + Refs (Z-Image) | ~10s each | ~50s (one-time) |
| Edit Chain (Qwen 4-step) | ~35s each | ~280s (8 edits) |
| Video Clip (WAN FLF 81 frames) | ~140s | ~700s |
| VRAM swaps (3x clear_vram) | ~30s each | ~90s |
| Total generation | ~19 min |
Use distinctive visual elements that transfer between characters/forms to create narrative connections: