arize-trace

Downloads, exports, and inspects existing Arize traces and spans to understand what an LLM app is doing or debug runtime issues. Covers exporting traces by ID,…

INSTALLATION
npx skills add https://github.com/arize-ai/arize-skills --skill arize-trace
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Recency warning: ax traces export and ax spans export return results in arbitrary order, not by recency. Running without --start-time will not give you the most recent traces. To fetch recent data (e.g., "last day's conversations"), always pass --start-time scoped to the relevant window.

Default output directory: Always use --output-dir .arize-tmp-traces on every ax spans export call. The CLI automatically creates the directory and adds it to .gitignore.

Prerequisites

Proceed directly with the task — run the ax command you need. Do NOT check versions, env vars, or profiles upfront.

If an ax command fails, troubleshoot based on the error:

  • command not found or version error → see references/ax-setup.md
  • 401 Unauthorized / missing API key → run ax profiles show to inspect the current profile. If the profile is missing or the API key is wrong, follow references/ax-profiles.md to create/update it. If the user doesn't have their key, direct them to https://app.arize.com/admin > API Keys
  • Space unknown → run ax spaces list to pick by name, or ask the user
  • Security: Never read .env files or search the filesystem for credentials. Use ax profiles for Arize credentials and ax ai-integrations for LLM provider keys. If credentials are not available through these channels, ask the user.
  • Project unclear → run ax projects list -l 100 -o json (add --space SPACE if known), present the names, and ask the user to pick one

IMPORTANT: For ax traces export, --space is required when using a project name. For ax spans export, --space is only required when using --all (Arrow Flight). If you hit 401 Unauthorized or limit errors, resolve the project name to a base64 ID first (see "Resolving project for export" in Concepts).

Deterministic verification rule: If you already know a specific trace_id and can resolve a base64 project ID, prefer ax spans export PROJECT --trace-id TRACE_ID for verification. Use ax traces export mainly for exploration or when you need the trace lookup phase.

Export Spans: ax spans export

The primary command for downloading trace data to a file.

By trace ID

ax spans export PROJECT --trace-id TRACE_ID --output-dir .arize-tmp-traces

By span ID

ax spans export PROJECT --span-id SPAN_ID --output-dir .arize-tmp-traces

By session ID

ax spans export PROJECT --session-id SESSION_ID --output-dir .arize-tmp-traces

Flags

Flag

Default

Description

PROJECT (positional)

$ARIZE_DEFAULT_PROJECT

Project name or base64 ID

--trace-id

Filter by context.trace_id (mutex with other ID flags)

--span-id

Filter by context.span_id (mutex with other ID flags)

--session-id

Filter by attributes.session.id (mutex with other ID flags)

--filter

SQL-like filter; combinable with any ID flag

--limit, -l

100

Max spans (REST); ignored with --all

--space

Required when using --all (Arrow Flight); not needed for project name in spans export

--days

30

Lookback window; ignored if --start-time/--end-time set

--start-time / --end-time

ISO 8601 time range override

--output-dir

.arize-tmp-traces

Output directory

--stdout

false

Print JSON to stdout instead of file

--all

false

Unlimited bulk export via Arrow Flight (see below)

Output is a JSON array of span objects. File naming: {type}_{id}_{timestamp}/spans.json.

When you have both a project ID and trace ID, this is the most reliable verification path:

ax spans export PROJECT --trace-id TRACE_ID --output-dir .arize-tmp-traces

Bulk export with --all

By default, ax spans export is capped at 500 spans by -l. Pass --all for unlimited bulk export.

ax spans export PROJECT --space SPACE --filter "status_code = 'ERROR'" --all --output-dir .arize-tmp-traces

**When to use --all:**

  • Exporting more than 500 spans
  • Downloading full traces with many child spans
  • Large time-range exports

Agent auto-escalation rule: If an export returns exactly the number of spans requested by -l (or 500 if no limit was set), the result is likely truncated. Increase -l or re-run with --all to get the full dataset — but only when the user asks or the task requires more data.

Decision tree:

Do you have a --trace-id, --span-id, or --session-id?

├─ YES: count is bounded → omit --all. If result is exactly 500, re-run with --all.

└─ NO (exploratory export):

    ├─ Just browsing a sample? → use -l 50

    └─ Need all matching spans?

        ├─ Expected < 500 → -l is fine

        └─ Expected ≥ 500 or unknown → use --all

            └─ Times out? → batch by --days (e.g., --days 7) and loop

Check span count first: Before a large exploratory export, check how many spans match your filter:

# Count matching spans without downloading them

ax spans export PROJECT --filter "status_code = 'ERROR'" -l 1 --stdout | jq 'length'

# If returns 1 (hit limit), run with --all

# If returns 0, no data matches -- check filter or expand --days

**Requirements for --all:**

  • --space is required (Flight uses space + project name)
  • --limit is ignored when --all is set

**Networking notes for --all:**

Arrow Flight connects to flight.arize.com:443 via gRPC+TLS -- this is a different host from the REST API (api.arize.com). On internal or private networks, the Flight endpoint may use a different host/port. Configure via:

  • ax profile: flight_host, flight_port, flight_scheme
  • Environment variables: ARIZE_FLIGHT_HOST, ARIZE_FLIGHT_PORT, ARIZE_FLIGHT_SCHEME

Internal/private deployment note: On internal Arize deployments, Arrow Flight may fail with auth errors even with a valid API key (the Flight endpoint may have additional network or auth restrictions). If --all fails, fall back to REST with batched time windows: loop over --start-time/--end-time ranges (e.g., day by day) using -l 500 per batch.

The --all flag is also available on ax traces export, ax datasets export, and ax experiments export with the same behavior (REST by default, Flight with --all).

Export Traces: ax traces export

Export full traces -- all spans belonging to traces that match a filter. Uses a two-phase approach:

  • Phase 1: Find spans matching --filter (up to --limit via REST, or all via Flight with --all)
  • Phase 2: Extract unique trace IDs, then fetch every span for those traces
# Explore recent traces — always pass --start-time; results are not ordered by recency without it

ax traces export PROJECT --space SPACE \

  --start-time "2026-04-05T00:00:00" \

  -l 50 --output-dir .arize-tmp-traces

# Export traces with error spans (REST, up to 500 spans in phase 1)

ax traces export PROJECT --filter "status_code = 'ERROR'" --stdout

# Export all traces matching a filter via Flight (no limit)

ax traces export PROJECT --space SPACE --filter "status_code = 'ERROR'" --all --output-dir .arize-tmp-traces

Flags

Flag

Type

Default

Description

PROJECT

string

required

Project name or base64 ID (positional arg)

--filter

string

none

Filter expression for phase-1 span lookup

--space

string

none

Space name or ID; required when PROJECT is a name or when using --all (Arrow Flight)

--limit, -l

int

50

Max number of traces to export

--days

int

30

Lookback window in days

--start-time

string

none

Override start (ISO 8601)

--end-time

string

none

Override end (ISO 8601)

--output-dir

string

.

Output directory

--stdout

bool

false

Print JSON to stdout instead of file

--all

bool

false

Use Arrow Flight for both phases (see spans --all docs above)

-p, --profile

string

default

Configuration profile

How it differs from ax spans export

  • ax spans export exports individual spans matching a filter
  • ax traces export exports complete traces -- it finds spans matching the filter, then pulls ALL spans for those traces (including siblings and children that may not match the filter)

Time-series index lag

Arize uses two storage tiers:

  • Primary trace store (indexed by trace_id) — spans are written here immediately on ingestion. --trace-id direct lookups (ax spans export PROJECT_ID --trace-id TRACE_ID) hit this store and are always up to date.
  • Time-series query index (used by --days, --start-time, --end-time) — built asynchronously from the primary store and lags 6–12 hours. Queries scoped by time range will miss very recent traces.

Implication: If you already have a trace_id, use ax spans export PROJECT_ID --trace-id TRACE_ID — it's faster and immediately consistent. Use time-range queries only for historical exploration, and set --start-time at least 12 hours in the past to guarantee results are indexed.

Filter Syntax Reference

SQL-like expressions passed to --filter.

Common filterable columns

Column

Type

Description

Example Values

name

string

Span name

'ChatCompletion', 'retrieve_docs'

status_code

string

Status

'OK', 'ERROR', 'UNSET'

latency_ms

number

Duration in ms

100, 5000

parent_id

string

Parent span ID

null for root spans

context.trace_id

string

Trace ID

context.span_id

string

Span ID

attributes.session.id

string

Session ID

attributes.openinference.span.kind

string

Span kind

'LLM', 'CHAIN', 'TOOL', 'AGENT', 'RETRIEVER', 'RERANKER', 'EMBEDDING', 'GUARDRAIL', 'EVALUATOR'

attributes.llm.model_name

string

LLM model

'gpt-4o', 'claude-3'

attributes.input.value

string

Span input

attributes.output.value

string

Span output

attributes.error.type

string

Error type

'ValueError', 'TimeoutError'

attributes.error.message

string

Error message

event.attributes

string

Error tracebacks

Use CONTAINS (not exact match)

Operators

=, !=, <, <=, >, >=, AND, OR, IN, CONTAINS, LIKE, IS NULL, IS NOT NULL

Examples

status_code = 'ERROR'

latency_ms > 5000

name = 'ChatCompletion' AND status_code = 'ERROR'

attributes.llm.model_name = 'gpt-4o'

attributes.openinference.span.kind IN ('LLM', 'AGENT')

attributes.error.type LIKE '%Transport%'

event.attributes CONTAINS 'TimeoutError'

Tips

  • Prefer IN over multiple OR conditions: name IN ('a', 'b', 'c') not name = 'a' OR name = 'b' OR name = 'c'
  • Start broad with LIKE, then switch to = or IN once you know exact values
  • Use CONTAINS for event.attributes (error tracebacks) -- exact match is unreliable on complex text
  • Always wrap string values in single quotes

Workflows

Debug a failing trace

  • ax traces export PROJECT --filter "status_code = 'ERROR'" -l 50 --output-dir .arize-tmp-traces
  • Read the output file, look for spans with status_code: ERROR
  • Check attributes.error.type and attributes.error.message on error spans

Download a conversation session

  • ax spans export PROJECT --session-id SESSION_ID --output-dir .arize-tmp-traces
  • Spans are ordered by start_time, grouped by context.trace_id
  • If you only have a trace_id, export that trace first, then look for attributes.session.id in the output to get the session ID

Export for offline analysis

ax spans export PROJECT --trace-id TRACE_ID --stdout | jq '.[]'

Troubleshooting rules

  • If ax traces export fails before querying spans because of project-name resolution, retry with a base64 project ID.
  • If ax spaces list is unsupported, treat ax projects list -o json as the fallback discovery surface.
  • If a user-provided --space is rejected by the CLI but the API key still lists projects without it, report the mismatch instead of silently swapping identifiers.
  • If exporter verification is the goal and the CLI path is unreliable, use the app's runtime/exporter logs plus the latest local trace_id to distinguish local instrumentation success from Arize-side ingestion failure.

Span Column Reference (OpenInference Semantic Conventions)

Core Identity and Timing

Column

Description

name

Span operation name (e.g., ChatCompletion, retrieve_docs)

context.trace_id

Trace ID -- all spans in a trace share this

context.span_id

Unique span ID

parent_id

Parent span ID. null for root spans (= traces)

start_time

When the span started (ISO 8601)

end_time

When the span ended

latency_ms

Duration in milliseconds

status_code

OK, ERROR, UNSET

status_message

Optional message (usually set on errors)

attributes.openinference.span.kind

LLM, CHAIN, TOOL, AGENT, RETRIEVER, RERANKER, EMBEDDING, GUARDRAIL, EVALUATOR

Where to Find Prompts and LLM I/O

Generic input/output (all span kinds):

Column

What it contains

attributes.input.value

The input to the operation. For LLM spans, often the full prompt or serialized messages JSON. For chain/agent spans, the user's question.

attributes.input.mime_type

Format hint: text/plain or application/json

attributes.output.value

The output. For LLM spans, the model's response. For chain/agent spans, the final answer.

attributes.output.mime_type

Format hint for output

LLM-specific message arrays (structured chat format):

Column

What it contains

attributes.llm.input_messages

Structured input messages array (system, user, assistant, tool). Where chat prompts live in role-based format.

attributes.llm.input_messages.roles

Array of roles: system, user, assistant, tool

attributes.llm.input_messages.contents

Array of message content strings

attributes.llm.output_messages

Structured output messages from the model

attributes.llm.output_messages.contents

Model response content

attributes.llm.output_messages.tool_calls.function.names

Tool calls the model wants to make

attributes.llm.output_messages.tool_calls.function.arguments

Arguments for those tool calls

Prompt templates:

Column

What it contains

attributes.llm.prompt_template.template

The prompt template with variable placeholders (e.g., "Answer {question} using {context}")

attributes.llm.prompt_template.variables

Template variable values (JSON object)

Finding prompts by span kind:

  • LLM span: Check attributes.llm.input_messages for structured chat messages, OR attributes.input.value for serialized prompt. Check attributes.llm.prompt_template.template for the template.
  • Chain/Agent span: Check attributes.input.value for the user's question. Actual LLM prompts are on child LLM spans.
  • Tool span: Check attributes.input.value for tool input, attributes.output.value for tool result.

LLM Model and Cost

Column

Description

attributes.llm.model_name

Model identifier (e.g., gpt-4o, claude-3-opus-20240229)

attributes.llm.invocation_parameters

Model parameters JSON (temperature, max_tokens, top_p, etc.)

attributes.llm.token_count.prompt

Input token count

attributes.llm.token_count.completion

Output token count

attributes.llm.token_count.total

Total tokens

attributes.llm.cost.prompt

Input cost in USD

attributes.llm.cost.completion

Output cost in USD

attributes.llm.cost.total

Total cost in USD

Tool Spans

Column

Description

attributes.tool.name

Tool/function name

attributes.tool.description

Tool description

attributes.tool.parameters

Tool parameter schema (JSON)

Retriever Spans

Column

Description

attributes.retrieval.documents

Retrieved documents array

attributes.retrieval.documents.ids

Document IDs

attributes.retrieval.documents.scores

Relevance scores

attributes.retrieval.documents.contents

Document text content

attributes.retrieval.documents.metadatas

Document metadata

Reranker Spans

Column

Description

attributes.reranker.query

The query being reranked

attributes.reranker.model_name

Reranker model

attributes.reranker.top_k

Number of results

attributes.reranker.input_documents.*

Input documents (ids, scores, contents, metadatas)

attributes.reranker.output_documents.*

Reranked output documents

Session, User, and Custom Metadata

Column

Description

attributes.session.id

Session/conversation ID -- groups traces into multi-turn sessions

attributes.user.id

End-user identifier

attributes.metadata.*

Custom key-value metadata. Any key under this prefix is user-defined (e.g., attributes.metadata.user_email). Filterable.

Errors and Exceptions

Column

Description

attributes.exception.type

Exception class name (e.g., ValueError, TimeoutError)

attributes.exception.message

Exception message text

event.attributes

Error tracebacks and detailed event data. Use CONTAINS for filtering.

Evaluations and Annotations

Column

Description

annotation.<name>.label

Human or auto-eval label (e.g., correct, incorrect)

annotation.<name>.score

Numeric score (e.g., 0.95)

annotation.<name>.text

Freeform annotation text

Embeddings

Column

Description

attributes.embedding.model_name

Embedding model name

attributes.embedding.texts

Text chunks that were embedded

Troubleshooting

Problem

Solution

ax: command not found

See references/ax-setup.md

SSL: CERTIFICATE_VERIFY_FAILED

macOS: export SSL_CERT_FILE=/etc/ssl/cert.pem. Linux: export SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt. Windows: $env:SSL_CERT_FILE = (python -c "import certifi; print(certifi.where())")

No such command on a subcommand that should exist

The installed ax is outdated. Reinstall: uv tool install --force --reinstall arize-ax-cli (requires shell access to install packages)

No profile found

No profile is configured. See references/ax-profiles.md to create one.

401 Unauthorized with valid API key

For ax traces export with a project name, add --space SPACE. For ax spans export, try resolving to a base64 project ID: ax projects list -l 100 -o json and use the project's id. If the key itself is wrong or expired, fix the profile using references/ax-profiles.md.

No spans found

Expand --days (default 30), verify project ID

Results don't include recent traces

Time-range queries lag 6–12h. Use --trace-id for immediate lookups of known traces. For time-range queries, set --start-time at least 12h in the past to ensure spans are indexed.

Filter error or invalid filter expression

Check column name spelling (e.g., attributes.openinference.span.kind not span_kind), wrap string values in single quotes, use CONTAINS for free-text fields

unknown attribute in filter

The attribute path is wrong or not indexed. Try browsing a small sample first to see actual column names: ax spans export PROJECT -l 5 --stdout | jq '.[0] | keys'

Timeout on large export

Use --days 7 to narrow the time range

Related Skills

  • arize-dataset: After collecting trace data, create labeled datasets for evaluation → use arize-dataset
  • arize-experiment: Run experiments comparing prompt versions against a dataset → use arize-experiment
  • arize-prompt-optimization: Use trace data to improve prompts → use arize-prompt-optimization
  • arize-link: Turn trace IDs from exported data into clickable Arize UI URLs → use arize-link

Save Credentials for Future Use

See references/ax-profiles.md § Save Credentials for Future Use.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card