SKILL.md

Guidance: Constrained LLM Generation

When to Use This Skill

Use Guidance when you need to:

Control LLM output syntax with regex or grammars

Guarantee valid JSON/XML/code generation

Reduce latency vs traditional prompting approaches

Enforce structured formats (dates, emails, IDs, etc.)

Build multi-step workflows with Pythonic control flow

Prevent invalid outputs through grammatical constraints

GitHub Stars: 18,000+ | From: Microsoft Research

Installation

# Base installation

pip install guidance

With specific backends

pip install guidance[transformers] # Hugging Face models

pip install guidance[llama_cpp] # llama.cpp models

## Quick Start

### Basic Example: Structured Generation

from guidance import models, gen

Load model (supports OpenAI, Transformers, llama.cpp)

lm = models.OpenAI("gpt-4")

Generate with constraints

result = lm + "The capital of France is " + gen("capital", max_tokens=5)

print(result["capital"]) # "Paris"


### With Anthropic Claude

from guidance import models, gen, system, user, assistant

Configure Claude

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Use context managers for chat format

with system():

lm += "You are a helpful assistant."

with user():

lm += "What is the capital of France?"

with assistant():

lm += gen(max_tokens=20)


## Core Concepts

### 1. Context Managers

Guidance uses Pythonic context managers for chat-style interactions.

from guidance import system, user, assistant, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

System message

with system():

lm += "You are a JSON generation expert."

User message

with user():

lm += "Generate a person object with name and age."

Assistant response

with assistant():

lm += gen("response", max_tokens=100)

print(lm["response"])


**Benefits:**

- Natural chat flow

- Clear role separation

- Easy to read and maintain

### 2. Constrained Generation

Guidance ensures outputs match specified patterns using regex or grammars.

#### Regex Constraints

from guidance import models, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Constrain to valid email format

lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")

Constrain to date format (YYYY-MM-DD)

lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}")

Constrain to phone number

lm += "Phone: " + gen("phone", regex=r"\d{3}-\d{3}-\d{4}")

print(lm["email"]) # Guaranteed valid email

print(lm["date"]) # Guaranteed YYYY-MM-DD format


**How it works:**

- Regex converted to grammar at token level

- Invalid tokens filtered during generation

- Model can only produce matching outputs

#### Selection Constraints

from guidance import models, gen, select

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Constrain to specific choices

lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")

Multiple-choice selection

lm += "Best answer: " + select(

["A) Paris", "B) London", "C) Berlin", "D) Madrid"],

name="answer"

)

print(lm["sentiment"]) # One of: positive, negative, neutral

print(lm["answer"]) # One of: A, B, C, or D


### 3. Token Healing

Guidance automatically "heals" token boundaries between prompt and generation.

**Problem:** Tokenization creates unnatural boundaries.

Without token healing

prompt = "The capital of France is "

Last token: " is "

First generated token might be " Par" (with leading space)

Result: "The capital of France is Paris" (double space!)


**Solution:** Guidance backs up one token and regenerates.

from guidance import models, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Token healing enabled by default

lm += "The capital of France is " + gen("capital", max_tokens=5)

Result: "The capital of France is Paris" (correct spacing)


**Benefits:**

- Natural text boundaries

- No awkward spacing issues

- Better model performance (sees natural token sequences)

### 4. Grammar-Based Generation

Define complex structures using context-free grammars.

from guidance import models, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

JSON grammar (simplified)

json_grammar = """

{

"name": <gen name regex="[A-Za-z ]+" max_tokens=20>,

"age": <gen age regex="[0-9]+" max_tokens=3>,

"email": <gen email regex="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}" max_tokens=50>

}

"""

Generate valid JSON

lm += gen("person", grammar=json_grammar)

print(lm["person"]) # Guaranteed valid JSON structure


**Use cases:**

- Complex structured outputs

- Nested data structures

- Programming language syntax

- Domain-specific languages

### 5. Guidance Functions

Create reusable generation patterns with the `@guidance` decorator.

from guidance import guidance, gen, models

@guidance

def generate_person(lm):

"""Generate a person with name and age."""

lm += "Name: " + gen("name", max_tokens=20, stop="\n")

lm += "\nAge: " + gen("age", regex=r"[0-9]+", max_tokens=3)

return lm

Use the function

lm = models.Anthropic("claude-sonnet-4-5-20250929")

lm = generate_person(lm)

print(lm["name"])

print(lm["age"])


**Stateful Functions:**

@guidance(stateless=False)

def react_agent(lm, question, tools, max_rounds=5):

"""ReAct agent with tool use."""

lm += f"Question: {question}\n\n"

for i in range(max_rounds):

# Thought

lm += f"Thought {i+1}: " + gen("thought", stop="\n")

# Action

lm += "\nAction: " + select(list(tools.keys()), name="action")

# Execute tool

tool_result = tools[lm["action"]]()

lm += f"\nObservation: {tool_result}\n\n"

# Check if done

lm += "Done? " + select(["Yes", "No"], name="done")

if lm["done"] == "Yes":

break

# Final answer

lm += "\nFinal Answer: " + gen("answer", max_tokens=100)

return lm


## Backend Configuration

### Anthropic Claude

from guidance import models

lm = models.Anthropic(

model="claude-sonnet-4-5-20250929",

api_key="your-api-key" # Or set ANTHROPIC_API_KEY env var

)


### OpenAI

lm = models.OpenAI(

model="gpt-4o-mini",

api_key="your-api-key" # Or set OPENAI_API_KEY env var

)


### Local Models (Transformers)

from guidance.models import Transformers

lm = Transformers(

"microsoft/Phi-4-mini-instruct",

device="cuda" # Or "cpu"

)


### Local Models (llama.cpp)

from guidance.models import LlamaCpp

lm = LlamaCpp(

model_path="/path/to/model.gguf",

n_ctx=4096,

n_gpu_layers=35

)


## Common Patterns

### Pattern 1: JSON Generation

from guidance import models, gen, system, user, assistant

lm = models.Anthropic("claude-sonnet-4-5-20250929")

with system():

lm += "You generate valid JSON."

with user():

lm += "Generate a user profile with name, age, and email."

with assistant():

lm += """{

"name": """ + gen("name", regex=r'"[A-Za-z ]+"', max_tokens=30) + """,

"age": """ + gen("age", regex=r"[0-9]+", max_tokens=3) + """,

"email": """ + gen("email", regex=r'"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"', max_tokens=50) + """

}"""

print(lm) # Valid JSON guaranteed


### Pattern 2: Classification

from guidance import models, gen, select

lm = models.Anthropic("claude-sonnet-4-5-20250929")

text = "This product is amazing! I love it."

lm += f"Text: {text}\n"

lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")

lm += "\nConfidence: " + gen("confidence", regex=r"[0-9]+", max_tokens=3) + "%"

print(f"Sentiment: {lm['sentiment']}")

print(f"Confidence: {lm['confidence']}%")


### Pattern 3: Multi-Step Reasoning

from guidance import models, gen, guidance

@guidance

def chain_of_thought(lm, question):

"""Generate answer with step-by-step reasoning."""

lm += f"Question: {question}\n\n"

# Generate multiple reasoning steps

for i in range(3):

lm += f"Step {i+1}: " + gen(f"step_{i+1}", stop="\n", max_tokens=100) + "\n"

# Final answer

lm += "\nTherefore, the answer is: " + gen("answer", max_tokens=50)

return lm

lm = models.Anthropic("claude-sonnet-4-5-20250929")

lm = chain_of_thought(lm, "What is 15% of 200?")

print(lm["answer"])


### Pattern 4: ReAct Agent

from guidance import models, gen, select, guidance

@guidance(stateless=False)

def react_agent(lm, question):

"""ReAct agent with tool use."""

tools = {

"calculator": lambda expr: eval(expr),

"search": lambda query: f"Search results for: {query}",

}

lm += f"Question: {question}\n\n"

for round in range(5):

# Thought

lm += f"Thought: " + gen("thought", stop="\n") + "\n"

# Action selection

lm += "Action: " + select(["calculator", "search", "answer"], name="action")

if lm["action"] == "answer":

lm += "\nFinal Answer: " + gen("answer", max_tokens=100)

break

# Action input

lm += "\nAction Input: " + gen("action_input", stop="\n") + "\n"

# Execute tool

if lm["action"] in tools:

result = tools[lm["action"]](lm["action_input"])

lm += f"Observation: {result}\n\n"

return lm

lm = models.Anthropic("claude-sonnet-4-5-20250929")

lm = react_agent(lm, "What is 25 * 4 + 10?")

print(lm["answer"])


### Pattern 5: Data Extraction

from guidance import models, gen, guidance

@guidance

def extract_entities(lm, text):

"""Extract structured entities from text."""

lm += f"Text: {text}\n\n"

# Extract person

lm += "Person: " + gen("person", stop="\n", max_tokens=30) + "\n"

# Extract organization

lm += "Organization: " + gen("organization", stop="\n", max_tokens=30) + "\n"

# Extract date

lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}", max_tokens=10) + "\n"

# Extract location

lm += "Location: " + gen("location", stop="\n", max_tokens=30) + "\n"

return lm

text = "Tim Cook announced at Apple Park on 2024-09-15 in Cupertino."

lm = models.Anthropic("claude-sonnet-4-5-20250929")

lm = extract_entities(lm, text)

print(f"Person: {lm['person']}")

print(f"Organization: {lm['organization']}")

print(f"Date: {lm['date']}")

print(f"Location: {lm['location']}")


## Best Practices

### 1. Use Regex for Format Validation

✅ Good: Regex ensures valid format

lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")

❌ Bad: Free generation may produce invalid emails

lm += "Email: " + gen("email", max_tokens=50)


### 2. Use select() for Fixed Categories

✅ Good: Guaranteed valid category

lm += "Status: " + select(["pending", "approved", "rejected"], name="status")

❌ Bad: May generate typos or invalid values

lm += "Status: " + gen("status", max_tokens=20)


### 3. Leverage Token Healing

Token healing is enabled by default

No special action needed - just concatenate naturally

lm += "The capital is " + gen("capital") # Automatic healing


### 4. Use stop Sequences

✅ Good: Stop at newline for single-line outputs

lm += "Name: " + gen("name", stop="\n")

❌ Bad: May generate multiple lines

lm += "Name: " + gen("name", max_tokens=50)


### 5. Create Reusable Functions

✅ Good: Reusable pattern

@guidance

def generate_person(lm):

lm += "Name: " + gen("name", stop="\n")

lm += "\nAge: " + gen("age", regex=r"[0-9]+")

return lm

Use multiple times

lm = generate_person(lm)

lm += "\n\n"

lm = generate_person(lm)


### 6. Balance Constraints

✅ Good: Reasonable constraints

lm += gen("name", regex=r"[A-Za-z ]+", max_tokens=30)

❌ Too strict: May fail or be very slow

lm += gen("name", regex=r"^(John|Jane)$", max_tokens=10)

guidance

SKILL.md

Guidance: Constrained LLM Generation

When to Use This Skill

Installation

With specific backends

Load model (supports OpenAI, Transformers, llama.cpp)

Generate with constraints

Configure Claude

Use context managers for chat format

System message

User message

Assistant response

Constrain to valid email format

Constrain to date format (YYYY-MM-DD)

Constrain to phone number

Constrain to specific choices

Multiple-choice selection

Without token healing

Last token: " is "

First generated token might be " Par" (with leading space)

Result: "The capital of France is Paris" (double space!)

Token healing enabled by default

Result: "The capital of France is Paris" (correct spacing)

JSON grammar (simplified)

Generate valid JSON

Use the function

✅ Good: Regex ensures valid format

❌ Bad: Free generation may produce invalid emails

✅ Good: Guaranteed valid category

❌ Bad: May generate typos or invalid values

Token healing is enabled by default

No special action needed - just concatenate naturally

✅ Good: Stop at newline for single-line outputs

❌ Bad: May generate multiple lines

✅ Good: Reusable pattern

Use multiple times

✅ Good: Reasonable constraints

❌ Too strict: May fail or be very slow

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers