SKILL.md

ComfyUI Workflow Builder

Translates natural language requests into executable ComfyUI workflow JSON. Always validates against inventory before generating.

Workflow Generation Process

Step 1: Understand the Request

Parse the user's intent into:

Output type: Image, video, or audio

Source material: Text-only, reference image(s), existing video

Identity method: None, zero-shot (InstantID/PuLID), LoRA, Kontext

Quality level: Draft (fast iteration) vs production (maximum quality)

Special requirements: ControlNet, inpainting, upscaling, lip-sync

Step 2: Check Inventory

Read state/inventory.json to determine:

Available checkpoints → select best match for task

Available identity models → determine which methods are possible

Available ControlNet models → enable pose/depth control if available

Custom nodes installed → verify all required nodes exist

VRAM available → optimize settings accordingly

Step 3: Select Pipeline Pattern

Based on request + inventory, choose from:

Pattern

When

Key Nodes

Text-to-Image

Simple generation

Checkpoint → CLIP → KSampler → VAE

Identity-Preserved Image

Character consistency

+ InstantID/PuLID/IP-Adapter

LoRA Character

Trained character

+ LoRA Loader

Image-to-Video (Wan)

High-quality video

Diffusion Model → Wan I2V → Video Combine

Image-to-Video (AnimateDiff)

Fast video, motion control

+ AnimateDiff Loader + Motion LoRAs

Talking Head

Character speaks

Image → Video → Voice → Lip-Sync

Upscale

Enhance resolution

Image → UltimateSDUpscale → Save

Inpainting

Edit regions

Image + Mask → Inpaint Model → KSampler

Step 4: Generate Workflow JSON

ComfyUI workflow format:

{

  "{node_id}": {

    "class_type": "{NodeClassName}",

    "inputs": {

      "{param_name}": "{value}",

      "{connected_param}": ["{source_node_id}", {output_index}]

    }

  }

}

Rules:

Node IDs are strings (typically "1", "2", "3"...)

Connected inputs use array format: ["source_node_id", output_index]

Output index is 0-based integer

Filenames must match exactly what's in inventory

Seed values: use random large integer or fixed for reproducibility

Step 5: Validate

Before presenting to user:

Every class_type exists in inventory's node list

Every model filename exists in inventory's model list

All required connections are present (no dangling inputs)

VRAM estimate doesn't exceed available VRAM

Resolution is compatible with chosen model (512 for SD1.5, 1024 for SDXL/FLUX)

Step 6: Output

If online mode: Queue via comfyui-api skill

If offline mode: Save JSON to projects/{project}/workflows/ with descriptive name

Workflow Templates

Basic Text-to-Image (FLUX)

{

  "1": {

    "class_type": "LoadCheckpoint",

    "inputs": {"ckpt_name": "flux1-dev.safetensors"}

  },

  "2": {

    "class_type": "CLIPTextEncode",

    "inputs": {"text": "{positive_prompt}", "clip": ["1", 1]}

  },

  "3": {

    "class_type": "CLIPTextEncode",

    "inputs": {"text": "{negative_prompt}", "clip": ["1", 1]}

  },

  "4": {

    "class_type": "EmptyLatentImage",

    "inputs": {"width": 1024, "height": 1024, "batch_size": 1}

  },

  "5": {

    "class_type": "KSampler",

    "inputs": {

      "seed": 42,

      "steps": 25,

      "cfg": 3.5,

      "sampler_name": "euler",

      "scheduler": "normal",

      "denoise": 1.0,

      "model": ["1", 0],

      "positive": ["2", 0],

      "negative": ["3", 0],

      "latent_image": ["4", 0]

    }

  },

  "6": {

    "class_type": "VAEDecode",

    "inputs": {"samples": ["5", 0], "vae": ["1", 2]}

  },

  "7": {

    "class_type": "SaveImage",

    "inputs": {"filename_prefix": "output", "images": ["6", 0]}

  }

}

With Identity Preservation (InstantID + IP-Adapter)

Extends basic template by adding:

Load reference image node

InstantID Model Loader + Apply InstantID

IPAdapter Unified Loader + Apply IPAdapter

FaceDetailer post-processing

See references/workflows.md for complete node settings.

Video Generation (Wan I2V)

Uses different loader chain:

Load Diffusion Model (not LoadCheckpoint)

Wan I2V Conditioning

EmptySD3LatentImage (with frame count)

Video Combine (VHS)

See references/workflows.md Workflow 4 for complete settings.

VRAM Estimation

Component

Approximate VRAM

FLUX FP16

16GB

FLUX FP8

8GB

SDXL

6GB

SD1.5

4GB

InstantID

+4GB

IP-Adapter

+2GB

ControlNet (each)

+1.5GB

Wan 14B

20GB

Wan 1.3B

5GB

AnimateDiff

+3GB

FaceDetailer

+2GB

Common Mistakes to Avoid

Wrong output index: CheckpointLoader outputs [model, clip, vae] at indices [0, 1, 2]

CFG too high for InstantID: Use 4-5, not default 7-8

Wrong resolution for model: FLUX/SDXL=1024, SD1.5=512

Missing VAE: FLUX needs explicit VAE (ae.safetensors)

Wrong model in wrong loader: Diffusion models need LoadDiffusionModel, not LoadCheckpoint

Reference Files

references/workflows.md - Detailed node-by-node templates

references/models.md - Model files and paths

references/prompt-templates.md - Model-specific prompts

state/inventory.json - Current inventory cache

comfyui-workflow-builder

SKILL.md

ComfyUI Workflow Builder

Workflow Generation Process

Step 1: Understand the Request

Step 2: Check Inventory

Step 3: Select Pipeline Pattern

Step 4: Generate Workflow JSON

Step 5: Validate

Step 6: Output

Workflow Templates

Basic Text-to-Image (FLUX)

With Identity Preservation (InstantID + IP-Adapter)

Video Generation (Wan I2V)

VRAM Estimation

Common Mistakes to Avoid

Reference Files

Stop writing automation&scrapers

comfyui-workflow-builder

SKILL.md

ComfyUI Workflow Builder

Workflow Generation Process

Step 1: Understand the Request

Step 2: Check Inventory

Step 3: Select Pipeline Pattern

Step 4: Generate Workflow JSON

Step 5: Validate

Step 6: Output

Workflow Templates

Basic Text-to-Image (FLUX)

With Identity Preservation (InstantID + IP-Adapter)

Video Generation (Wan I2V)

VRAM Estimation

Common Mistakes to Avoid

Reference Files

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers