guidance

Control LLM output with regex and grammars, guarantee valid JSON/XML/code generation, enforce structured formats, and build multi-step workflows with Guidance…

INSTALLATION
npx skills add https://github.com/davila7/claude-code-templates --skill guidance
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Guidance: Constrained LLM Generation

When to Use This Skill

Use Guidance when you need to:

  • Control LLM output syntax with regex or grammars
  • Guarantee valid JSON/XML/code generation
  • Reduce latency vs traditional prompting approaches
  • Enforce structured formats (dates, emails, IDs, etc.)
  • Build multi-step workflows with Pythonic control flow
  • Prevent invalid outputs through grammatical constraints

GitHub Stars: 18,000+ | From: Microsoft Research

Installation

# Base installation

pip install guidance

With specific backends

pip install guidance[transformers] # Hugging Face models

pip install guidance[llama_cpp] # llama.cpp models

## Quick Start

### Basic Example: Structured Generation

from guidance import models, gen

Load model (supports OpenAI, Transformers, llama.cpp)

lm = models.OpenAI("gpt-4")

Generate with constraints

result = lm + "The capital of France is " + gen("capital", max_tokens=5)

print(result["capital"]) # "Paris"


### With Anthropic Claude

from guidance import models, gen, system, user, assistant

Configure Claude

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Use context managers for chat format

with system():

lm += "You are a helpful assistant."

with user():

lm += "What is the capital of France?"

with assistant():

lm += gen(max_tokens=20)


## Core Concepts

### 1. Context Managers

Guidance uses Pythonic context managers for chat-style interactions.

from guidance import system, user, assistant, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

System message

with system():

lm += "You are a JSON generation expert."

User message

with user():

lm += "Generate a person object with name and age."

Assistant response

with assistant():

lm += gen("response", max_tokens=100)

print(lm["response"])


**Benefits:**

- Natural chat flow

- Clear role separation

- Easy to read and maintain

### 2. Constrained Generation

Guidance ensures outputs match specified patterns using regex or grammars.

#### Regex Constraints

from guidance import models, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Constrain to valid email format

lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")

Constrain to date format (YYYY-MM-DD)

lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}")

Constrain to phone number

lm += "Phone: " + gen("phone", regex=r"\d{3}-\d{3}-\d{4}")

print(lm["email"]) # Guaranteed valid email

print(lm["date"]) # Guaranteed YYYY-MM-DD format


**How it works:**

- Regex converted to grammar at token level

- Invalid tokens filtered during generation

- Model can only produce matching outputs

#### Selection Constraints

from guidance import models, gen, select

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Constrain to specific choices

lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")

Multiple-choice selection

lm += "Best answer: " + select(

["A) Paris", "B) London", "C) Berlin", "D) Madrid"],

name="answer"

)

print(lm["sentiment"]) # One of: positive, negative, neutral

print(lm["answer"]) # One of: A, B, C, or D


### 3. Token Healing

Guidance automatically "heals" token boundaries between prompt and generation.

**Problem:** Tokenization creates unnatural boundaries.

Without token healing

prompt = "The capital of France is "

Last token: " is "

First generated token might be " Par" (with leading space)

Result: "The capital of France is Paris" (double space!)


**Solution:** Guidance backs up one token and regenerates.

from guidance import models, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

Token healing enabled by default

lm += "The capital of France is " + gen("capital", max_tokens=5)

Result: "The capital of France is Paris" (correct spacing)


**Benefits:**

- Natural text boundaries

- No awkward spacing issues

- Better model performance (sees natural token sequences)

### 4. Grammar-Based Generation

Define complex structures using context-free grammars.

from guidance import models, gen

lm = models.Anthropic("claude-sonnet-4-5-20250929")

JSON grammar (simplified)

json_grammar = """

{

"name": <gen name regex="[A-Za-z ]+" max_tokens=20>,

"age": <gen age regex="[0-9]+" max_tokens=3>,

"email": <gen email regex="[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}" max_tokens=50>

}

"""

Generate valid JSON

lm += gen("person", grammar=json_grammar)

print(lm["person"]) # Guaranteed valid JSON structure


**Use cases:**

- Complex structured outputs

- Nested data structures

- Programming language syntax

- Domain-specific languages

### 5. Guidance Functions

Create reusable generation patterns with the `@guidance` decorator.

from guidance import guidance, gen, models

@guidance

def generate_person(lm):

"""Generate a person with name and age."""

lm += "Name: " + gen("name", max_tokens=20, stop="\n")

lm += "\nAge: " + gen("age", regex=r"[0-9]+", max_tokens=3)

return lm

Use the function

lm = models.Anthropic("claude-sonnet-4-5-20250929")

lm = generate_person(lm)

print(lm["name"])

print(lm["age"])


**Stateful Functions:**

@guidance(stateless=False)

def react_agent(lm, question, tools, max_rounds=5):

"""ReAct agent with tool use."""

lm += f"Question: {question}\n\n"

for i in range(max_rounds):

# Thought

lm += f"Thought {i+1}: " + gen("thought", stop="\n")

# Action

lm += "\nAction: " + select(list(tools.keys()), name="action")

# Execute tool

tool_result = tools[lm["action"]]()

lm += f"\nObservation: {tool_result}\n\n"

# Check if done

lm += "Done? " + select(["Yes", "No"], name="done")

if lm["done"] == "Yes":

break

# Final answer

lm += "\nFinal Answer: " + gen("answer", max_tokens=100)

return lm


## Backend Configuration

### Anthropic Claude

from guidance import models

lm = models.Anthropic(

model="claude-sonnet-4-5-20250929",

api_key="your-api-key" # Or set ANTHROPIC_API_KEY env var

)


### OpenAI

lm = models.OpenAI(

model="gpt-4o-mini",

api_key="your-api-key" # Or set OPENAI_API_KEY env var

)


### Local Models (Transformers)

from guidance.models import Transformers

lm = Transformers(

"microsoft/Phi-4-mini-instruct",

device="cuda" # Or "cpu"

)


### Local Models (llama.cpp)

from guidance.models import LlamaCpp

lm = LlamaCpp(

model_path="/path/to/model.gguf",

n_ctx=4096,

n_gpu_layers=35

)


## Common Patterns

### Pattern 1: JSON Generation

from guidance import models, gen, system, user, assistant

lm = models.Anthropic("claude-sonnet-4-5-20250929")

with system():

lm += "You generate valid JSON."

with user():

lm += "Generate a user profile with name, age, and email."

with assistant():

lm += """{

"name": """ + gen("name", regex=r'"[A-Za-z ]+"', max_tokens=30) + """,

"age": """ + gen("age", regex=r"[0-9]+", max_tokens=3) + """,

"email": """ + gen("email", regex=r'"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"', max_tokens=50) + """

}"""

print(lm) # Valid JSON guaranteed


### Pattern 2: Classification

from guidance import models, gen, select

lm = models.Anthropic("claude-sonnet-4-5-20250929")

text = "This product is amazing! I love it."

lm += f"Text: {text}\n"

lm += "Sentiment: " + select(["positive", "negative", "neutral"], name="sentiment")

lm += "\nConfidence: " + gen("confidence", regex=r"[0-9]+", max_tokens=3) + "%"

print(f"Sentiment: {lm['sentiment']}")

print(f"Confidence: {lm['confidence']}%")


### Pattern 3: Multi-Step Reasoning

from guidance import models, gen, guidance

@guidance

def chain_of_thought(lm, question):

"""Generate answer with step-by-step reasoning."""

lm += f"Question: {question}\n\n"

# Generate multiple reasoning steps

for i in range(3):

lm += f"Step {i+1}: " + gen(f"step_{i+1}", stop="\n", max_tokens=100) + "\n"

# Final answer

lm += "\nTherefore, the answer is: " + gen("answer", max_tokens=50)

return lm

lm = models.Anthropic("claude-sonnet-4-5-20250929")

lm = chain_of_thought(lm, "What is 15% of 200?")

print(lm["answer"])


### Pattern 4: ReAct Agent

from guidance import models, gen, select, guidance

@guidance(stateless=False)

def react_agent(lm, question):

"""ReAct agent with tool use."""

tools = {

"calculator": lambda expr: eval(expr),

"search": lambda query: f"Search results for: {query}",

}

lm += f"Question: {question}\n\n"

for round in range(5):

# Thought

lm += f"Thought: " + gen("thought", stop="\n") + "\n"

# Action selection

lm += "Action: " + select(["calculator", "search", "answer"], name="action")

if lm["action"] == "answer":

lm += "\nFinal Answer: " + gen("answer", max_tokens=100)

break

# Action input

lm += "\nAction Input: " + gen("action_input", stop="\n") + "\n"

# Execute tool

if lm["action"] in tools:

result = tools[lm["action"]](lm["action_input"])

lm += f"Observation: {result}\n\n"

return lm

lm = models.Anthropic("claude-sonnet-4-5-20250929")

lm = react_agent(lm, "What is 25 * 4 + 10?")

print(lm["answer"])


### Pattern 5: Data Extraction

from guidance import models, gen, guidance

@guidance

def extract_entities(lm, text):

"""Extract structured entities from text."""

lm += f"Text: {text}\n\n"

# Extract person

lm += "Person: " + gen("person", stop="\n", max_tokens=30) + "\n"

# Extract organization

lm += "Organization: " + gen("organization", stop="\n", max_tokens=30) + "\n"

# Extract date

lm += "Date: " + gen("date", regex=r"\d{4}-\d{2}-\d{2}", max_tokens=10) + "\n"

# Extract location

lm += "Location: " + gen("location", stop="\n", max_tokens=30) + "\n"

return lm

text = "Tim Cook announced at Apple Park on 2024-09-15 in Cupertino."

lm = models.Anthropic("claude-sonnet-4-5-20250929")

lm = extract_entities(lm, text)

print(f"Person: {lm['person']}")

print(f"Organization: {lm['organization']}")

print(f"Date: {lm['date']}")

print(f"Location: {lm['location']}")


## Best Practices

### 1. Use Regex for Format Validation

✅ Good: Regex ensures valid format

lm += "Email: " + gen("email", regex=r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")

❌ Bad: Free generation may produce invalid emails

lm += "Email: " + gen("email", max_tokens=50)


### 2. Use select() for Fixed Categories

✅ Good: Guaranteed valid category

lm += "Status: " + select(["pending", "approved", "rejected"], name="status")

❌ Bad: May generate typos or invalid values

lm += "Status: " + gen("status", max_tokens=20)


### 3. Leverage Token Healing

Token healing is enabled by default

No special action needed - just concatenate naturally

lm += "The capital is " + gen("capital") # Automatic healing


### 4. Use stop Sequences

✅ Good: Stop at newline for single-line outputs

lm += "Name: " + gen("name", stop="\n")

❌ Bad: May generate multiple lines

lm += "Name: " + gen("name", max_tokens=50)


### 5. Create Reusable Functions

✅ Good: Reusable pattern

@guidance

def generate_person(lm):

lm += "Name: " + gen("name", stop="\n")

lm += "\nAge: " + gen("age", regex=r"[0-9]+")

return lm

Use multiple times

lm = generate_person(lm)

lm += "\n\n"

lm = generate_person(lm)


### 6. Balance Constraints

✅ Good: Reasonable constraints

lm += gen("name", regex=r"[A-Za-z ]+", max_tokens=30)

❌ Too strict: May fail or be very slow

lm += gen("name", regex=r"^(John|Jane)$", max_tokens=10)

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card