arize-dataset

Creates, manages, and queries Arize datasets and examples. Covers dataset CRUD, appending examples, exporting data, and file-based dataset creation using the…

INSTALLATION
npx skills add https://github.com/arize-ai/arize-skills --skill arize-dataset
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

If an ax command fails, troubleshoot based on the error:

  • command not found or version error → see references/ax-setup.md
  • 401 Unauthorized / missing API key → run ax profiles show to inspect the current profile. If the profile is missing or the API key is wrong, follow references/ax-profiles.md to create/update it. If the user doesn't have their key, direct them to https://app.arize.com/admin > API Keys
  • Space unknown → run ax spaces list to pick by name, or ask the user
  • Project unclear → ask the user, or run ax projects list -o json --limit 100 and present as selectable options
  • Security: Never read .env files or search the filesystem for credentials. Use ax profiles for Arize credentials and ax ai-integrations for LLM provider keys. If credentials are not available through these channels, ask the user.

List Datasets: ax datasets list

Browse datasets in a space. Output goes to stdout.

ax datasets list

ax datasets list --space SPACE --limit 20

ax datasets list --cursor CURSOR_TOKEN

ax datasets list -o json

Flags

Flag

Type

Default

Description

--space

string

from profile

Filter by space

--limit, -l

int

15

Max results (1-100)

--cursor

string

none

Pagination cursor from previous response

-o, --output

string

table

Output format: table, json, csv, parquet, or file path

-p, --profile

string

default

Configuration profile

Get Dataset: ax datasets get

Quick metadata lookup -- returns dataset name, space, timestamps, and version list.

ax datasets get NAME_OR_ID

ax datasets get NAME_OR_ID -o json

ax datasets get NAME_OR_ID --space SPACE   # required when using dataset name instead of ID

Flags

Flag

Type

Default

Description

NAME_OR_ID

string

required

Dataset name or ID (positional)

--space

string

none

Space name or ID (required if using dataset name instead of ID)

-o, --output

string

table

Output format

-p, --profile

string

default

Configuration profile

Response fields

Field

Type

Description

id

string

Dataset ID

name

string

Dataset name

space_id

string

Space this dataset belongs to

created_at

datetime

When the dataset was created

updated_at

datetime

Last modification time

versions

array

List of dataset versions (id, name, dataset_id, created_at, updated_at)

Export Dataset: ax datasets export

Download all examples to a file. Use --all for datasets larger than 500 examples (unlimited bulk export).

ax datasets export NAME_OR_ID

# -> dataset_abc123_20260305_141500/examples.json

ax datasets export NAME_OR_ID --all

ax datasets export NAME_OR_ID --version-id VERSION_ID

ax datasets export NAME_OR_ID --output-dir ./data

ax datasets export NAME_OR_ID --stdout

ax datasets export NAME_OR_ID --stdout | jq '.[0]'

ax datasets export NAME_OR_ID --space SPACE   # required when using dataset name instead of ID

Flags

Flag

Type

Default

Description

NAME_OR_ID

string

required

Dataset name or ID (positional)

--space

string

none

Space name or ID (required if using dataset name instead of ID)

--version-id

string

latest

Export a specific dataset version

--all

bool

false

Unlimited bulk export (use for datasets > 500 examples)

--output-dir

string

.

Output directory

--stdout

bool

false

Print JSON to stdout instead of file

-p, --profile

string

default

Configuration profile

Agent auto-escalation rule: If an export returns exactly 500 examples, the result is likely truncated — re-run with --all to get the full dataset.

Export completeness verification: After exporting, confirm the row count matches what the server reports:

# Get the server-reported count from dataset metadata

ax datasets get DATASET_NAME --space SPACE -o json | jq '.versions[-1] | {version: .id, examples: .example_count}'

# Compare to what was exported

jq 'length' dataset_*/examples.json

# If counts differ, re-export with --all

Output is a JSON array of example objects. Each example has system fields (id, created_at, updated_at) plus all user-defined fields:

[

  {

    "id": "ex_001",

    "created_at": "2026-01-15T10:00:00Z",

    "updated_at": "2026-01-15T10:00:00Z",

    "question": "What is 2+2?",

    "answer": "4",

    "topic": "math"

  }

]

Create Dataset: ax datasets create

Create a new dataset from a data file.

ax datasets create --name "My Dataset" --space SPACE --file data.csv

ax datasets create --name "My Dataset" --space SPACE --file data.json

ax datasets create --name "My Dataset" --space SPACE --file data.jsonl

ax datasets create --name "My Dataset" --space SPACE --file data.parquet

Flags

Flag

Type

Required

Description

--name, -n

string

yes

Dataset name

--space

string

yes

Space to create the dataset in

--file, -f

path

yes

Data file: CSV, JSON, JSONL, or Parquet

-o, --output

string

no

Output format for the returned dataset metadata

-p, --profile

string

no

Configuration profile

Passing data via stdin

Use --file - to pipe data directly — no temp file needed:

echo '[{"question": "What is 2+2?", "answer": "4"}]' | ax datasets create --name "my-dataset" --space SPACE --file -

# Or with a heredoc

ax datasets create --name "my-dataset" --space SPACE --file - << 'EOF'

[{"question": "What is 2+2?", "answer": "4"}]

EOF

To add rows to an existing dataset, use ax datasets append --json '[...]' instead — no file needed.

Supported file formats

Format

Extension

Notes

CSV

.csv

Column headers become field names

JSON

.json

Array of objects

JSON Lines

.jsonl

One object per line (NOT a JSON array)

Parquet

.parquet

Column names become field names; preserves types

Format gotchas:

  • CSV: Loses type information — dates become strings, null becomes empty string. Use JSON/Parquet to preserve types.
  • JSONL: Each line is a separate JSON object. A JSON array ([{...}, {...}]) in a .jsonl file will fail — use .json extension instead.
  • Parquet: Preserves column types. Requires pandas/pyarrow to read locally: pd.read_parquet("examples.parquet").

Append Examples: ax datasets append

Add examples to an existing dataset. Two input modes -- use whichever fits.

Inline JSON (agent-friendly)

Generate the payload directly -- no temp files needed:

ax datasets append DATASET_NAME --space SPACE --json '[{"question": "What is 2+2?", "answer": "4"}]'

ax datasets append DATASET_NAME --space SPACE --json '[

  {"question": "What is gravity?", "answer": "A fundamental force..."},

  {"question": "What is light?", "answer": "Electromagnetic radiation..."}

]'

From a file

ax datasets append DATASET_NAME --space SPACE --file new_examples.csv

ax datasets append DATASET_NAME --space SPACE --file additions.json

To a specific version

ax datasets append DATASET_NAME --space SPACE --json '[{"q": "..."}]' --version-id VERSION_ID

Flags

Flag

Type

Required

Description

NAME_OR_ID

string

yes

Dataset name or ID (positional); add --space when using name

--space

string

no

Space name or ID (required if using dataset name instead of ID)

--json

string

mutex

JSON array of example objects

--file, -f

path

mutex

Data file (CSV, JSON, JSONL, Parquet)

--version-id

string

no

Append to a specific version (default: latest)

-o, --output

string

no

Output format for the returned dataset metadata

-p, --profile

string

no

Configuration profile

Exactly one of --json or --file is required.

Validation

  • Each example must be a JSON object with at least one user-defined field
  • Maximum 100,000 examples per request

Schema validation before append: If the dataset already has examples, inspect its schema before appending to avoid silent field mismatches:

# Check existing field names in the dataset

ax datasets export DATASET_NAME --space SPACE --stdout | jq '.[0] | keys'

# Verify your new data has matching field names

echo '[{"question": "..."}]' | jq '.[0] | keys'

# Both outputs should show the same user-defined fields

Fields are free-form: extra fields in new examples are added, and missing fields become null. However, typos in field names (e.g., queston vs question) create new columns silently -- verify spelling before appending.

Delete Dataset: ax datasets delete

ax datasets delete NAME_OR_ID

ax datasets delete NAME_OR_ID --space SPACE   # required when using dataset name instead of ID

ax datasets delete NAME_OR_ID --force   # skip confirmation prompt

Flags

Flag

Type

Default

Description

NAME_OR_ID

string

required

Dataset name or ID (positional)

--space

string

none

Space name or ID (required if using dataset name instead of ID)

--force, -f

bool

false

Skip confirmation prompt

-p, --profile

string

default

Configuration profile

Workflows

Find a dataset by name

All dataset commands accept a name or ID directly. You can pass a dataset name as the positional argument (add --space SPACE when not using an ID):

# Use name directly

ax datasets get "eval-set-v1" --space SPACE

ax datasets export "eval-set-v1" --space SPACE

# Or resolve name to ID via list if you need the base64 ID

ax datasets list -o json | jq '.[] | select(.name == "eval-set-v1") | .id'

Create a dataset from file for evaluation

  • Prepare a CSV/JSON/Parquet file with your evaluation columns (e.g., input, expected_output)
  • If generating data inline, pipe it via stdin using --file - (see the Create Dataset section)
  • ax datasets create --name "eval-set-v1" --space SPACE --file eval_data.csv
  • Verify: ax datasets get DATASET_NAME --space SPACE
  • Use the dataset name to run experiments

Add examples to an existing dataset

# Find the dataset

ax datasets list --space SPACE

# Append inline or from a file using the dataset name (see Append Examples section for full syntax)

ax datasets append DATASET_NAME --space SPACE --json '[{"question": "...", "answer": "..."}]'

ax datasets append DATASET_NAME --space SPACE --file additional_examples.csv

Download dataset for offline analysis

  • ax datasets list --space SPACE -- find the dataset name
  • ax datasets export DATASET_NAME --space SPACE -- download to file
  • Parse the JSON: jq '.[] | .question' dataset_*/examples.json

Export a specific version

# List versions

ax datasets get DATASET_NAME --space SPACE -o json | jq '.versions'

# Export that version

ax datasets export DATASET_NAME --space SPACE --version-id VERSION_ID

Iterate on a dataset

  • Export current version: ax datasets export DATASET_NAME --space SPACE
  • Modify the examples locally
  • Append new rows: ax datasets append DATASET_NAME --space SPACE --file new_rows.csv
  • Or create a fresh version: ax datasets create --name "eval-set-v2" --space SPACE --file updated_data.json

Pipe export to other tools

# Count examples

ax datasets export DATASET_NAME --space SPACE --stdout | jq 'length'

# Extract a single field

ax datasets export DATASET_NAME --space SPACE --stdout | jq '.[].question'

# Convert to CSV with jq

ax datasets export DATASET_NAME --space SPACE --stdout | jq -r '.[] | [.question, .answer] | @csv'

Dataset Example Schema

Examples are free-form JSON objects. There is no fixed schema -- columns are whatever fields you provide. System-managed fields are added by the server:

Field

Type

Managed by

Notes

id

string

server

Auto-generated UUID. Required on update, forbidden on create/append

created_at

datetime

server

Immutable creation timestamp

updated_at

datetime

server

Auto-updated on modification

(any user field)

any JSON type

user

String, number, boolean, null, nested object, array

Related Skills

  • arize-trace: Export production spans to understand what data to put in datasets → use arize-trace
  • arize-experiment: Run evaluations against this dataset → next step is arize-experiment
  • arize-prompt-optimization: Use dataset + experiment results to improve prompts → use arize-prompt-optimization

Troubleshooting

Problem

Solution

ax: command not found

See references/ax-setup.md

401 Unauthorized

API key is wrong, expired, or doesn't have access to this space. Fix the profile using references/ax-profiles.md.

No profile found

No profile is configured. See references/ax-profiles.md to create one.

Dataset not found

Verify dataset ID with ax datasets list

File format error

Supported: CSV, JSON, JSONL, Parquet. Use --file - to read from stdin.

platform-managed column

Remove id, created_at, updated_at from create/append payloads

reserved column

Remove time, count, or any source_record_* field

Provide either --json or --file

Append requires exactly one input source

Examples array is empty

Ensure your JSON array or file contains at least one example

not a JSON object

Each element in the --json array must be a {...} object, not a string or number

Save Credentials for Future Use

See references/ax-profiles.md § Save Credentials for Future Use.

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card