SKILL.md

Stable Diffusion Image Generation

Comprehensive guide to generating images with Stable Diffusion using the HuggingFace Diffusers library.

When to use Stable Diffusion

Use Stable Diffusion when:

Generating images from text descriptions

Performing image-to-image translation (style transfer, enhancement)

Inpainting (filling in masked regions)

Outpainting (extending images beyond boundaries)

Creating variations of existing images

Building custom image generation workflows

Key features:

Text-to-Image: Generate images from natural language prompts

Image-to-Image: Transform existing images with text guidance

Inpainting: Fill masked regions with context-aware content

ControlNet: Add spatial conditioning (edges, poses, depth)

LoRA Support: Efficient fine-tuning and style adaptation

Multiple Models: SD 1.5, SDXL, SD 3.0, Flux support

Use alternatives instead:

DALL-E 3: For API-based generation without GPU

Midjourney: For artistic, stylized outputs

Imagen: For Google Cloud integration

Leonardo.ai: For web-based creative workflows

Quick start

Installation

pip install diffusers transformers accelerate torch

pip install xformers  # Optional: memory-efficient attention

Basic text-to-image

from diffusers import DiffusionPipeline

import torch

# Load pipeline (auto-detects model type)

pipe = DiffusionPipeline.from_pretrained(

    "stable-diffusion-v1-5/stable-diffusion-v1-5",

    torch_dtype=torch.float16

)

pipe.to("cuda")

# Generate image

image = pipe(

    "A serene mountain landscape at sunset, highly detailed",

    num_inference_steps=50,

    guidance_scale=7.5

).images[0]

image.save("output.png")

Using SDXL (higher quality)

from diffusers import AutoPipelineForText2Image

import torch

pipe = AutoPipelineForText2Image.from_pretrained(

    "stabilityai/stable-diffusion-xl-base-1.0",

    torch_dtype=torch.float16,

    variant="fp16"

)

pipe.to("cuda")

# Enable memory optimization

pipe.enable_model_cpu_offload()

image = pipe(

    prompt="A futuristic city with flying cars, cinematic lighting",

    height=1024,

    width=1024,

    num_inference_steps=30

).images[0]

Architecture overview

Three-pillar design

Diffusers is built around three core components:

Pipeline (orchestration)

├── Model (neural networks)

│   ├── UNet / Transformer (noise prediction)

│   ├── VAE (latent encoding/decoding)

│   └── Text Encoder (CLIP/T5)

└── Scheduler (denoising algorithm)

Pipeline inference flow

Text Prompt → Text Encoder → Text Embeddings

                                    ↓

Random Noise → [Denoising Loop] ← Scheduler

                      ↓

               Predicted Noise

                      ↓

              VAE Decoder → Final Image

Core concepts

Pipelines

Pipelines orchestrate complete workflows:

Pipeline

Purpose

StableDiffusionPipeline

Text-to-image (SD 1.x/2.x)

StableDiffusionXLPipeline

Text-to-image (SDXL)

StableDiffusion3Pipeline

Text-to-image (SD 3.0)

FluxPipeline

Text-to-image (Flux models)

StableDiffusionImg2ImgPipeline

Image-to-image

StableDiffusionInpaintPipeline

Inpainting

Schedulers

Schedulers control the denoising process:

Scheduler

Steps

Quality

Use Case

EulerDiscreteScheduler

20-50

Good

Default choice

EulerAncestralDiscreteScheduler

20-50

Good

More variation

DPMSolverMultistepScheduler

15-25

Excellent

Fast, high quality

DDIMScheduler

50-100

Good

Deterministic

LCMScheduler

4-8

Good

Very fast

UniPCMultistepScheduler

15-25

Excellent

Fast convergence

Swapping schedulers

from diffusers import DPMSolverMultistepScheduler

# Swap for faster generation

pipe.scheduler = DPMSolverMultistepScheduler.from_config(

    pipe.scheduler.config

)

# Now generate with fewer steps

image = pipe(prompt, num_inference_steps=20).images[0]

Generation parameters

Key parameters

Parameter

Default

Description

prompt

Required

Text description of desired image

negative_prompt

None

What to avoid in the image

num_inference_steps

Denoising steps (more = better quality)

guidance_scale

7.5

Prompt adherence (7-12 typical)

height, width

512/1024

Output dimensions (multiples of 8)

generator

None

Torch generator for reproducibility

num_images_per_prompt

Batch size

Reproducible generation

import torch

generator = torch.Generator(device="cuda").manual_seed(42)

image = pipe(

    prompt="A cat wearing a top hat",

    generator=generator,

    num_inference_steps=50

).images[0]

Negative prompts

image = pipe(

    prompt="Professional photo of a dog in a garden",

    negative_prompt="blurry, low quality, distorted, ugly, bad anatomy",

    guidance_scale=7.5

).images[0]

Image-to-image

Transform existing images with text guidance:

from diffusers import AutoPipelineForImage2Image

from PIL import Image

pipe = AutoPipelineForImage2Image.from_pretrained(

    "stable-diffusion-v1-5/stable-diffusion-v1-5",

    torch_dtype=torch.float16

).to("cuda")

init_image = Image.open("input.jpg").resize((512, 512))

image = pipe(

    prompt="A watercolor painting of the scene",

    image=init_image,

    strength=0.75,  # How much to transform (0-1)

    num_inference_steps=50

).images[0]

Inpainting

Fill masked regions:

from diffusers import AutoPipelineForInpainting

from PIL import Image

pipe = AutoPipelineForInpainting.from_pretrained(

    "runwayml/stable-diffusion-inpainting",

    torch_dtype=torch.float16

).to("cuda")

image = Image.open("photo.jpg")

mask = Image.open("mask.png")  # White = inpaint region

result = pipe(

    prompt="A red car parked on the street",

    image=image,

    mask_image=mask,

    num_inference_steps=50

).images[0]

ControlNet

Add spatial conditioning for precise control:

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel

import torch

# Load ControlNet for edge conditioning

controlnet = ControlNetModel.from_pretrained(

    "lllyasviel/control_v11p_sd15_canny",

    torch_dtype=torch.float16

)

pipe = StableDiffusionControlNetPipeline.from_pretrained(

    "stable-diffusion-v1-5/stable-diffusion-v1-5",

    controlnet=controlnet,

    torch_dtype=torch.float16

).to("cuda")

# Use Canny edge image as control

control_image = get_canny_image(input_image)

image = pipe(

    prompt="A beautiful house in the style of Van Gogh",

    image=control_image,

    num_inference_steps=30

).images[0]

Available ControlNets

ControlNet

Input Type

Use Case

canny

Edge maps

Preserve structure

openpose

Pose skeletons

Human poses

depth

Depth maps

3D-aware generation

normal

Normal maps

Surface details

mlsd

Line segments

Architectural lines

scribble

Rough sketches

Sketch-to-image

LoRA adapters

Load fine-tuned style adapters:

from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(

    "stable-diffusion-v1-5/stable-diffusion-v1-5",

    torch_dtype=torch.float16

).to("cuda")

# Load LoRA weights

pipe.load_lora_weights("path/to/lora", weight_name="style.safetensors")

# Generate with LoRA style

image = pipe("A portrait in the trained style").images[0]

# Adjust LoRA strength

pipe.fuse_lora(lora_scale=0.8)

# Unload LoRA

pipe.unload_lora_weights()

Multiple LoRAs

# Load multiple LoRAs

pipe.load_lora_weights("lora1", adapter_name="style")

pipe.load_lora_weights("lora2", adapter_name="character")

# Set weights for each

pipe.set_adapters(["style", "character"], adapter_weights=[0.7, 0.5])

image = pipe("A portrait").images[0]

Memory optimization

Enable CPU offloading

# Model CPU offload - moves models to CPU when not in use

pipe.enable_model_cpu_offload()

# Sequential CPU offload - more aggressive, slower

pipe.enable_sequential_cpu_offload()

Attention slicing

# Reduce memory by computing attention in chunks

pipe.enable_attention_slicing()

# Or specific chunk size

pipe.enable_attention_slicing("max")

xFormers memory-efficient attention

# Requires xformers package

pipe.enable_xformers_memory_efficient_attention()

VAE slicing for large images

# Decode latents in tiles for large images

pipe.enable_vae_slicing()

pipe.enable_vae_tiling()

Model variants

Loading different precisions

# FP16 (recommended for GPU)

pipe = DiffusionPipeline.from_pretrained(

    "model-id",

    torch_dtype=torch.float16,

    variant="fp16"

)

# BF16 (better precision, requires Ampere+ GPU)

pipe = DiffusionPipeline.from_pretrained(

    "model-id",

    torch_dtype=torch.bfloat16

)

Loading specific components

from diffusers import UNet2DConditionModel, AutoencoderKL

# Load custom VAE

vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse")

# Use with pipeline

pipe = DiffusionPipeline.from_pretrained(

    "stable-diffusion-v1-5/stable-diffusion-v1-5",

    vae=vae,

    torch_dtype=torch.float16

)

Batch generation

Generate multiple images efficiently:

# Multiple prompts

prompts = [

    "A cat playing piano",

    "A dog reading a book",

    "A bird painting a picture"

]

images = pipe(prompts, num_inference_steps=30).images

# Multiple images per prompt

images = pipe(

    "A beautiful sunset",

    num_images_per_prompt=4,

    num_inference_steps=30

).images

Common workflows

Workflow 1: High-quality generation

from diffusers import StableDiffusionXLPipeline, DPMSolverMultistepScheduler

import torch

# 1. Load SDXL with optimizations

pipe = StableDiffusionXLPipeline.from_pretrained(

    "stabilityai/stable-diffusion-xl-base-1.0",

    torch_dtype=torch.float16,

    variant="fp16"

)

pipe.to("cuda")

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_model_cpu_offload()

# 2. Generate with quality settings

image = pipe(

    prompt="A majestic lion in the savanna, golden hour lighting, 8k, detailed fur",

    negative_prompt="blurry, low quality, cartoon, anime, sketch",

    num_inference_steps=30,

    guidance_scale=7.5,

    height=1024,

    width=1024

).images[0]

Workflow 2: Fast prototyping

from diffusers import AutoPipelineForText2Image, LCMScheduler

import torch

# Use LCM for 4-8 step generation

pipe = AutoPipelineForText2Image.from_pretrained(

    "stabilityai/stable-diffusion-xl-base-1.0",

    torch_dtype=torch.float16

).to("cuda")

# Load LCM LoRA for fast generation

pipe.load_lora_weights("latent-consistency/lcm-lora-sdxl")

pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)

pipe.fuse_lora()

# Generate in ~1 second

image = pipe(

    "A beautiful landscape",

    num_inference_steps=4,

    guidance_scale=1.0

).images[0]

Common issues

CUDA out of memory:

# Enable memory optimizations

pipe.enable_model_cpu_offload()

pipe.enable_attention_slicing()

pipe.enable_vae_slicing()

# Or use lower precision

pipe = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)

Black/noise images:

# Check VAE configuration

# Use safety checker bypass if needed

pipe.safety_checker = None

# Ensure proper dtype consistency

pipe = pipe.to(dtype=torch.float16)

Slow generation:

# Use faster scheduler

from diffusers import DPMSolverMultistepScheduler

pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

# Reduce steps

image = pipe(prompt, num_inference_steps=20).images[0]

References

Advanced Usage - Custom pipelines, fine-tuning, deployment

Troubleshooting - Common issues and solutions

Resources

Documentation: https://huggingface.co/docs/diffusers

Repository: https://github.com/huggingface/diffusers

Model Hub: https://huggingface.co/models?library=diffusers

Discord: https://discord.gg/diffusers

stable-diffusion-image-generation

SKILL.md

Stable Diffusion Image Generation

When to use Stable Diffusion

Quick start

Installation

Basic text-to-image

Using SDXL (higher quality)

Architecture overview

Three-pillar design

Pipeline inference flow

Core concepts

Pipelines

Schedulers

Swapping schedulers

Generation parameters

Key parameters

Reproducible generation

Negative prompts

Image-to-image

Inpainting

ControlNet

Available ControlNets

LoRA adapters

Multiple LoRAs

Memory optimization

Enable CPU offloading

Attention slicing

xFormers memory-efficient attention

VAE slicing for large images

Model variants

Loading different precisions

Loading specific components

Batch generation

Common workflows

Workflow 1: High-quality generation

Workflow 2: Fast prototyping

Common issues

References

Resources

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers