instructor

Extract structured data from LLM responses with Pydantic validation, retry failed extractions automatically, parse complex JSON with type safety, and stream…

INSTALLATION
npx skills add https://github.com/davila7/claude-code-templates --skill instructor
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

Instructor: Structured LLM Outputs

When to Use This Skill

Use Instructor when you need to:

  • Extract structured data from LLM responses reliably
  • Validate outputs against Pydantic schemas automatically
  • Retry failed extractions with automatic error handling
  • Parse complex JSON with type safety and validation
  • Stream partial results for real-time processing
  • Support multiple LLM providers with consistent API

GitHub Stars: 15,000+ | Battle-tested: 100,000+ developers

Installation

# Base installation

pip install instructor

With specific providers

pip install "instructor[anthropic]" # Anthropic Claude

pip install "instructor[openai]" # OpenAI

pip install "instructor[all]" # All providers

## Quick Start

### Basic Example: Extract User Data

import instructor

from pydantic import BaseModel

from anthropic import Anthropic

Define output structure

class User(BaseModel):

name: str

age: int

email: str

Create instructor client

client = instructor.from_anthropic(Anthropic())

Extract structured data

user = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "John Doe is 30 years old. His email is john@example.com"

}],

response_model=User

)

print(user.name) # "John Doe"

print(user.age) # 30

print(user.email) # "john@example.com"


### With OpenAI

from openai import OpenAI

client = instructor.from_openai(OpenAI())

user = client.chat.completions.create(

model="gpt-4o-mini",

response_model=User,

messages=[{"role": "user", "content": "Extract: Alice, 25, alice@email.com"}]

)


## Core Concepts

### 1. Response Models (Pydantic)

Response models define the structure and validation rules for LLM outputs.

#### Basic Model

from pydantic import BaseModel, Field

class Article(BaseModel):

title: str = Field(description="Article title")

author: str = Field(description="Author name")

word_count: int = Field(description="Number of words", gt=0)

tags: list[str] = Field(description="List of relevant tags")

article = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "Analyze this article: [article text]"

}],

response_model=Article

)


**Benefits:**

- Type safety with Python type hints

- Automatic validation (word_count > 0)

- Self-documenting with Field descriptions

- IDE autocomplete support

#### Nested Models

class Address(BaseModel):

street: str

city: str

country: str

class Person(BaseModel):

name: str

age: int

address: Address # Nested model

person = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "John lives at 123 Main St, Boston, USA"

}],

response_model=Person

)

print(person.address.city) # "Boston"


#### Optional Fields

from typing import Optional

class Product(BaseModel):

name: str

price: float

discount: Optional[float] = None # Optional

description: str = Field(default="No description") # Default value

LLM doesn't need to provide discount or description


#### Enums for Constraints

from enum import Enum

class Sentiment(str, Enum):

POSITIVE = "positive"

NEGATIVE = "negative"

NEUTRAL = "neutral"

class Review(BaseModel):

text: str

sentiment: Sentiment # Only these 3 values allowed

review = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "This product is amazing!"

}],

response_model=Review

)

print(review.sentiment) # Sentiment.POSITIVE


### 2. Validation

Pydantic validates LLM outputs automatically. If validation fails, Instructor retries.

#### Built-in Validators

from pydantic import Field, EmailStr, HttpUrl

class Contact(BaseModel):

name: str = Field(min_length=2, max_length=100)

age: int = Field(ge=0, le=120) # 0 <= age <= 120

email: EmailStr # Validates email format

website: HttpUrl # Validates URL format

If LLM provides invalid data, Instructor retries automatically


#### Custom Validators

from pydantic import field_validator

class Event(BaseModel):

name: str

date: str

attendees: int

@field_validator('date')

def validate_date(cls, v):

"""Ensure date is in YYYY-MM-DD format."""

import re

if not re.match(r'\d{4}-\d{2}-\d{2}', v):

raise ValueError('Date must be YYYY-MM-DD format')

return v

@field_validator('attendees')

def validate_attendees(cls, v):

"""Ensure positive attendees."""

if v < 1:

raise ValueError('Must have at least 1 attendee')

return v


#### Model-Level Validation

from pydantic import model_validator

class DateRange(BaseModel):

start_date: str

end_date: str

@model_validator(mode='after')

def check_dates(self):

"""Ensure end_date is after start_date."""

from datetime import datetime

start = datetime.strptime(self.start_date, '%Y-%m-%d')

end = datetime.strptime(self.end_date, '%Y-%m-%d')

if end < start:

raise ValueError('end_date must be after start_date')

return self


### 3. Automatic Retrying

Instructor retries automatically when validation fails, providing error feedback to the LLM.

Retries up to 3 times if validation fails

user = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "Extract user from: John, age unknown"

}],

response_model=User,

max_retries=3 # Default is 3

)

If age can't be extracted, Instructor tells the LLM:

"Validation error: age - field required"

LLM tries again with better extraction


**How it works:**

- LLM generates output

- Pydantic validates

- If invalid: Error message sent back to LLM

- LLM tries again with error feedback

- Repeats up to max_retries

### 4. Streaming

Stream partial results for real-time processing.

#### Streaming Partial Objects

from instructor import Partial

class Story(BaseModel):

title: str

content: str

tags: list[str]

Stream partial updates as LLM generates

for partial_story in client.messages.create_partial(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "Write a short sci-fi story"

}],

response_model=Story

):

print(f"Title: {partial_story.title}")

print(f"Content so far: {partial_story.content[:100]}...")

# Update UI in real-time


#### Streaming Iterables

class Task(BaseModel):

title: str

priority: str

Stream list items as they're generated

tasks = client.messages.create_iterable(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "Generate 10 project tasks"

}],

response_model=Task

)

for task in tasks:

print(f"- {task.title} ({task.priority})")

# Process each task as it arrives


## Provider Configuration

### Anthropic Claude

import instructor

from anthropic import Anthropic

client = instructor.from_anthropic(

Anthropic(api_key="your-api-key")

)

Use with Claude models

response = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[...],

response_model=YourModel

)


### OpenAI

from openai import OpenAI

client = instructor.from_openai(

OpenAI(api_key="your-api-key")

)

response = client.chat.completions.create(

model="gpt-4o-mini",

response_model=YourModel,

messages=[...]

)


### Local Models (Ollama)

from openai import OpenAI

Point to local Ollama server

client = instructor.from_openai(

OpenAI(

base_url="http://localhost:11434/v1",

api_key="ollama" # Required but ignored

),

mode=instructor.Mode.JSON

)

response = client.chat.completions.create(

model="llama3.1",

response_model=YourModel,

messages=[...]

)


## Common Patterns

### Pattern 1: Data Extraction from Text

class CompanyInfo(BaseModel):

name: str

founded_year: int

industry: str

employees: int

headquarters: str

text = """

Tesla, Inc. was founded in 2003. It operates in the automotive and energy

industry with approximately 140,000 employees. The company is headquartered

in Austin, Texas.

"""

company = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": f"Extract company information from: {text}"

}],

response_model=CompanyInfo

)


### Pattern 2: Classification

class Category(str, Enum):

TECHNOLOGY = "technology"

FINANCE = "finance"

HEALTHCARE = "healthcare"

EDUCATION = "education"

OTHER = "other"

class ArticleClassification(BaseModel):

category: Category

confidence: float = Field(ge=0.0, le=1.0)

keywords: list[str]

classification = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": "Classify this article: [article text]"

}],

response_model=ArticleClassification

)


### Pattern 3: Multi-Entity Extraction

class Person(BaseModel):

name: str

role: str

class Organization(BaseModel):

name: str

industry: str

class Entities(BaseModel):

people: list[Person]

organizations: list[Organization]

locations: list[str]

text = "Tim Cook, CEO of Apple, announced at the event in Cupertino..."

entities = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": f"Extract all entities from: {text}"

}],

response_model=Entities

)

for person in entities.people:

print(f"{person.name} - {person.role}")


### Pattern 4: Structured Analysis

class SentimentAnalysis(BaseModel):

overall_sentiment: Sentiment

positive_aspects: list[str]

negative_aspects: list[str]

suggestions: list[str]

score: float = Field(ge=-1.0, le=1.0)

review = "The product works well but setup was confusing..."

analysis = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": f"Analyze this review: {review}"

}],

response_model=SentimentAnalysis

)


### Pattern 5: Batch Processing

def extract_person(text: str) -> Person:

return client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[{

"role": "user",

"content": f"Extract person from: {text}"

}],

response_model=Person

)

texts = [

"John Doe is a 30-year-old engineer",

"Jane Smith, 25, works in marketing",

"Bob Johnson, age 40, software developer"

]

people = [extract_person(text) for text in texts]


## Advanced Features

### Union Types

from typing import Union

class TextContent(BaseModel):

type: str = "text"

content: str

class ImageContent(BaseModel):

type: str = "image"

url: HttpUrl

caption: str

class Post(BaseModel):

title: str

content: Union[TextContent, ImageContent] # Either type

LLM chooses appropriate type based on content


### Dynamic Models

from pydantic import create_model

Create model at runtime

DynamicUser = create_model(

'User',

name=(str, ...),

age=(int, Field(ge=0)),

email=(EmailStr, ...)

)

user = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[...],

response_model=DynamicUser

)


### Custom Modes

For providers without native structured outputs

client = instructor.from_anthropic(

Anthropic(),

mode=instructor.Mode.JSON # JSON mode

)

Available modes:

- Mode.ANTHROPIC_TOOLS (recommended for Claude)

- Mode.JSON (fallback)

- Mode.TOOLS (OpenAI tools)


### Context Management

Single-use client

with instructor.from_anthropic(Anthropic()) as client:

result = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[...],

response_model=YourModel

)

# Client closed automatically


## Error Handling

### Handling Validation Errors

from pydantic import ValidationError

try:

user = client.messages.create(

model="claude-sonnet-4-5-20250929",

max_tokens=1024,

messages=[...],

response_model=User,

max_retries=3

)

except ValidationError as e:

print(f"Failed after retries: {e}")

# Handle gracefully

except Exception as e:

print(f"API error: {e}")


### Custom Error Messages

class ValidatedUser(BaseModel):

name: str = Field(description="Full name, 2-100 characters")

age: int = Field(description="Age between 0 and 120", ge=0, le=120)

email: EmailStr = Field(description="Valid email address")

class Config:

# Custom error messages

json_schema_extra = {

"examples": [

{

"name": "John Doe",

"age": 30,

"email": "john@example.com"

}

]

}


## Best Practices

### 1. Clear Field Descriptions

❌ Bad: Vague

class Product(BaseModel):

name: str

price: float

✅ Good: Descriptive

class Product(BaseModel):

name: str = Field(description="Product name from the text")

price: float = Field(description="Price in USD, without currency symbol")


### 2. Use Appropriate Validation

✅ Good: Constrain values

class Rating(BaseModel):

score: int = Field(ge=1, le=5, description="Rating from 1 to 5 stars")

review: str = Field(min_length=10, description="Review text, at least 10 chars")


### 3. Provide Examples in Prompts

messages = [{

"role": "user",

"content": """Extract person info from: "John, 30, engineer"

Example format:

{

"name": "John Doe",

"age": 30,

"occupation": "engineer"

}"""

}]


### 4. Use Enums for Fixed Categories

✅ Good: Enum ensures valid values

class Status(str, Enum):

PENDING = "pending"

APPROVED = "approved"

REJECTED = "rejected"

class Application(BaseModel):

status: Status # LLM must choose from enum


### 5. Handle Missing Data Gracefully

class PartialData(BaseModel):

required_field: str

optional_field: Optional[str] = None

default_field: str = "default_value"

LLM only needs to provide required_field

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card