SKILL.md
ComfyUI Workflow Builder
Translates natural language requests into executable ComfyUI workflow JSON. Always validates against inventory before generating.
Workflow Generation Process
Step 1: Understand the Request
Parse the user's intent into:
- Output type: Image, video, or audio
- Source material: Text-only, reference image(s), existing video
- Identity method: None, zero-shot (InstantID/PuLID), LoRA, Kontext
- Quality level: Draft (fast iteration) vs production (maximum quality)
- Special requirements: ControlNet, inpainting, upscaling, lip-sync
Step 2: Check Inventory
Read state/inventory.json to determine:
- Available checkpoints → select best match for task
- Available identity models → determine which methods are possible
- Available ControlNet models → enable pose/depth control if available
- Custom nodes installed → verify all required nodes exist
- VRAM available → optimize settings accordingly
Step 3: Select Pipeline Pattern
Based on request + inventory, choose from:
Pattern
When
Key Nodes
Text-to-Image
Simple generation
Checkpoint → CLIP → KSampler → VAE
Identity-Preserved Image
Character consistency
+ InstantID/PuLID/IP-Adapter
LoRA Character
Trained character
+ LoRA Loader
Image-to-Video (Wan)
High-quality video
Diffusion Model → Wan I2V → Video Combine
Image-to-Video (AnimateDiff)
Fast video, motion control
+ AnimateDiff Loader + Motion LoRAs
Talking Head
Character speaks
Image → Video → Voice → Lip-Sync
Upscale
Enhance resolution
Image → UltimateSDUpscale → Save
Inpainting
Edit regions
Image + Mask → Inpaint Model → KSampler
Step 4: Generate Workflow JSON
ComfyUI workflow format:
{
"{node_id}": {
"class_type": "{NodeClassName}",
"inputs": {
"{param_name}": "{value}",
"{connected_param}": ["{source_node_id}", {output_index}]
}
}
}
Rules:
- Node IDs are strings (typically "1", "2", "3"...)
- Connected inputs use array format:
["source_node_id", output_index]
- Output index is 0-based integer
- Filenames must match exactly what's in inventory
- Seed values: use random large integer or fixed for reproducibility
Step 5: Validate
Before presenting to user:
- Every
class_typeexists in inventory's node list
- Every model filename exists in inventory's model list
- All required connections are present (no dangling inputs)
- VRAM estimate doesn't exceed available VRAM
- Resolution is compatible with chosen model (512 for SD1.5, 1024 for SDXL/FLUX)
Step 6: Output
If online mode: Queue via comfyui-api skill
If offline mode: Save JSON to projects/{project}/workflows/ with descriptive name
Workflow Templates
Basic Text-to-Image (FLUX)
{
"1": {
"class_type": "LoadCheckpoint",
"inputs": {"ckpt_name": "flux1-dev.safetensors"}
},
"2": {
"class_type": "CLIPTextEncode",
"inputs": {"text": "{positive_prompt}", "clip": ["1", 1]}
},
"3": {
"class_type": "CLIPTextEncode",
"inputs": {"text": "{negative_prompt}", "clip": ["1", 1]}
},
"4": {
"class_type": "EmptyLatentImage",
"inputs": {"width": 1024, "height": 1024, "batch_size": 1}
},
"5": {
"class_type": "KSampler",
"inputs": {
"seed": 42,
"steps": 25,
"cfg": 3.5,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.0,
"model": ["1", 0],
"positive": ["2", 0],
"negative": ["3", 0],
"latent_image": ["4", 0]
}
},
"6": {
"class_type": "VAEDecode",
"inputs": {"samples": ["5", 0], "vae": ["1", 2]}
},
"7": {
"class_type": "SaveImage",
"inputs": {"filename_prefix": "output", "images": ["6", 0]}
}
}
With Identity Preservation (InstantID + IP-Adapter)
Extends basic template by adding:
- Load reference image node
- InstantID Model Loader + Apply InstantID
- IPAdapter Unified Loader + Apply IPAdapter
- FaceDetailer post-processing
See references/workflows.md for complete node settings.
Video Generation (Wan I2V)
Uses different loader chain:
- Load Diffusion Model (not LoadCheckpoint)
- Wan I2V Conditioning
- EmptySD3LatentImage (with frame count)
- Video Combine (VHS)
See references/workflows.md Workflow 4 for complete settings.
VRAM Estimation
Component
Approximate VRAM
FLUX FP16
16GB
FLUX FP8
8GB
SDXL
6GB
SD1.5
4GB
InstantID
+4GB
IP-Adapter
+2GB
ControlNet (each)
+1.5GB
Wan 14B
20GB
Wan 1.3B
5GB
AnimateDiff
+3GB
FaceDetailer
+2GB
Common Mistakes to Avoid
- Wrong output index: CheckpointLoader outputs
[model, clip, vae]at indices[0, 1, 2]
- CFG too high for InstantID: Use 4-5, not default 7-8
- Wrong resolution for model: FLUX/SDXL=1024, SD1.5=512
- Missing VAE: FLUX needs explicit VAE (
ae.safetensors)
- Wrong model in wrong loader: Diffusion models need
LoadDiffusionModel, notLoadCheckpoint
Reference Files
references/workflows.md- Detailed node-by-node templates
references/models.md- Model files and paths
references/prompt-templates.md- Model-specific prompts
state/inventory.json- Current inventory cache