Troubleshoots ComfyUI errors like OOM, device mismatches, missing nodes, dtype issues, black images using diagnosis (get_history, get_logs) and fixes (FP8 models, lowvram, tiled VAE).
From comfynpx claudepluginhub artokun/comfyui-mcp --plugin comfyThis skill uses the workspace's default tool permissions.
Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.
Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
When a workflow fails, follow this systematic approach:
get_history to retrieve the execution result with full tracebackget_logs with keyword filters like "error", "warning", "traceback"node_id and node_type that failedget_node_info to verify the failing node's expected input schemalist_local_models to verify all referenced model files existtorch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate X MiB.
GPU 0 has a total capacity of 24.00 GiB of which X MiB is free.
Or:
RuntimeError: CUDA error: out of memory
The GPU does not have enough VRAM to hold the model weights, intermediate tensors, and latent images simultaneously. Common triggers:
search_models("flux fp8") or search_models("sdxl fp8")--lowvram flag: ComfyUI CLI flag that offloads model parts to CPU during inferenceVAEDecodeTiled instead of VAEDecodeEmptyLatentImage| Model | FP32 | FP16 | FP8 |
|---|---|---|---|
| SD 1.5 | ~4GB | ~2GB | ~1GB |
| SDXL | ~12GB | ~6GB | ~3GB |
| Flux Dev | ~48GB | ~24GB | ~12GB |
| Flux Schnell | ~48GB | ~24GB | ~12GB |
| LTXV | ~20GB+ | ~10GB+ | ~6GB |
RuntimeError: Expected all tensors to be on the same device, but found at least
two devices, cuda:0 and cpu!
A tensor on the CPU is being combined with a tensor on the GPU. This usually happens when:
--lowvram or --cpu, some nodes may not support CPU offloadingCannot find node class 'NodeClassName'
Or in the execution response:
"error": {"type": "node_not_found", "message": "Cannot find node class 'X'"}
The workflow references a node type that is not installed. This happens when:
search_custom_nodes("NodeClassName")
get_logs(keyword="import")
get_logs(keyword="error")
Import errors often reveal missing Python dependenciespip install missing-package
RuntimeError: Input contains NaN
Or images come out as solid gray/noise with NaN warnings in logs.
Numerical instability during the diffusion process. Common triggers:
vae-ft-mse-840000-ema-pruned.safetensors (FP32)RuntimeError: expected scalar type Float but found Half
Or:
RuntimeError: expected scalar type Half but found Float
Or:
RuntimeError: Input type (float) and bias type (c10::Half) should be the same
A model component expects one precision (FP32/FP16) but receives another. Most common with:
VAELoader with vae-ft-mse-840000-ema-pruned.safetensorsVAEDecodeFP32 nodes--force-fp32 flag forces everything to FP32 (uses more VRAM)No explicit error — the prompt is silently truncated at 77 tokens, and details mentioned late in the prompt are ignored.
subject description, pose, clothing, setting
BREAK
lighting, style, quality, camera angle
No error in the execution — the workflow "succeeds" but produces completely black or near-black images.
| Cause | Diagnosis | Fix |
|---|---|---|
denoise = 0 | Check KSampler inputs | Set denoise to 1.0 for txt2img, 0.5-0.8 for img2img |
cfg = 0 | Check KSampler inputs | Set CFG to 7.0 (SD 1.5), 1.0 (Flux) |
steps = 0 | Check KSampler inputs | Set steps to 20+ (standard) or 4+ (turbo) |
| Wrong VAE | VAE doesn't match model | Use the correct VAE for the model family |
| Empty prompt | CLIPTextEncode has empty text | Add a text prompt |
| Wrong scheduler | Incompatible scheduler/sampler combo | Try "normal" scheduler with "euler" sampler |
| Seed collision | Extremely rare | Change the seed value |
| FP16 VAE overflow | VAE decode produces black | Use FP32 VAE or VAEDecodeTiled |
denoise > 0 (should be 1.0 for txt2img)cfg > 0 (should be 7.0 for SD 1.5, 1.0 for Flux)steps > 0 (should be 20 for standard, 4 for turbo)euler + normalOutput type 'IMAGE' doesn't match input type 'LATENT'
Or:
Required input 'model' of type 'MODEL' but got connection of type 'CLIP'
Connecting the wrong output slot of a node to an incompatible input. Often caused by using the wrong output index.
get_node_info to verify the exact output order
CheckpointLoaderSimple outputs: 0=MODEL, 1=CLIP, 2=VAE["1", 0] gives MODEL, ["1", 1] gives CLIP["nodeId", outputIndex] — node ID is a string, index is an integerMODEL → KSampler
CLIP → CLIPTextEncode → CONDITIONING → KSampler
LATENT → KSampler → LATENT → VAEDecode → IMAGE
VAE → VAEDecode, VAEEncode
FileNotFoundError: [Errno 2] No such file or directory: 'models/checkpoints/model.safetensors'
Or:
SafetensorError: Error reading file: invalid header
Or:
RuntimeError: PytorchStreamReader failed reading zip archive
list_local_models(model_type="checkpoints")download_model(url="...", target_subfolder="checkpoints")
checkpoints/, loras/, vae/, etc.)RuntimeError: CUDA error: no kernel image is available for execution on the device
Or:
ImportError: cannot import name 'xxx' from 'torch'
Or:
AssertionError: Torch not compiled with CUDA enabled
PyTorch and CUDA version incompatibility, usually after:
get_system_stats() # Shows PyTorch version and CUDA version
torch.cuda.is_available()pip install commands that might change PyTorch| Aspect | ComfyUI Desktop | ComfyUI CLI |
|---|---|---|
| Default port | 8000 | 8188 |
| Python | Embedded (bundled) | System/venv Python |
| Install location | AppData/Local/Programs/ComfyUI/ | Wherever you cloned it |
| Custom nodes | Documents/ComfyUI/custom_nodes/ | ./custom_nodes/ in repo |
| Models | Documents/ComfyUI/models/ | ./models/ in repo |
| Config | extra_model_paths.yaml for shared paths | Same |
| Updates | Auto-updater in the app | git pull |
get_history() # Most recent execution
get_history(prompt_id="abc-123") # Specific execution
The response includes:
status.status_str: "success" or "error"status.messages: Timestamped execution messagesoutputs: Node outputs (images, etc.)get_system_stats() # GPU info, VRAM, Python/PyTorch versions
get_queue() # Running and pending jobs
get_logs(max_lines=50, keyword="error") # Recent error logs
get_node_info(node_type="KSampler") # Check specific node
get_node_info(node_type="ControlNetApply") # Verify custom nodes loaded
list_local_models(model_type="checkpoints") # Installed checkpoints
list_local_models(model_type="loras") # Installed LoRAs
list_local_models(model_type="controlnet") # Installed ControlNets
| Error Message (partial) | Most Likely Fix |
|---|---|
CUDA out of memory | Reduce resolution, use FP8 model, --lowvram |
Expected all tensors on same device | Update custom node, restart ComfyUI |
Cannot find node class | Install the node pack, restart ComfyUI |
Input contains NaN | Lower CFG, use FP32 VAE, remove LoRAs |
expected scalar type Float but found Half | Use FP32 VAE, or --force-fp32 |
No such file or directory (model) | Check filename, re-download model |
invalid header (safetensors) | Re-download — file is corrupted |
CUDA error: no kernel image | Reinstall PyTorch with matching CUDA version |
| Black images, no error | Check denoise > 0, cfg > 0, steps > 0, prompt not empty |
| Image looks garbled/noisy | Wrong model+VAE combo, wrong sampler settings |
Connection refused on port 8188 | ComfyUI not running, or using Desktop (port 8000) |
Prompt outputs failed validation | Node inputs don't match schema — check get_node_info |