Name: launchdarkly-experiment-setup
Author: launchdarkly

launchdarkly-experiment-setup

Set up and run experiments in LaunchDarkly. Create experiments with metrics and treatments, start iterations to collect data, and monitor results.

INSTALLATION

npx skills add https://github.com/launchdarkly/agent-skills --skill launchdarkly-experiment-setup

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Core Concepts

What Are Experiments?

Experiments in LaunchDarkly let you measure the impact of feature flag variations on key metrics. An experiment consists of:

Treatments: The flag variations being compared (control vs. test)

Metrics: What you're measuring (conversion rate, latency, revenue, etc.)

Iterations: Data collection periods — start an iteration to begin collecting data

Holdout (optional): A percentage of traffic excluded from the experiment for baseline measurement

Experiment Lifecycle

Create the experiment with metrics and treatments

Start an iteration to begin data collection

Monitor results as data accumulates

Stop the iteration when you have statistical significance

Ship the winning variation

Core Principles

Metrics First: Ensure your metrics exist before creating the experiment

Clear Hypothesis: Know what you expect to improve and by how much

Proper Controls: Always include a control treatment (the current behavior)

Sufficient Sample Size: Let experiments run long enough for statistical significance

One Change at a Time: Test one variable per experiment for clear attribution

Workflow

Step 1: Prepare Metrics

Before creating an experiment, ensure the metrics you want to measure exist:

Use list-metrics to check for existing metrics

If needed, use create-metric to create new ones

Note the metric keys — you'll need them for the experiment

Common metric types:

Goal

Metric Type

Example

Conversion

Custom conversion

checkout-completed

Performance

Custom numeric

page-load-time-ms

Engagement

Custom conversion

feature-clicked

Revenue

Custom numeric

order-value

Step 2: Create the Experiment

Use create-experiment with:

projectKey and environmentKey -- where to run the experiment

name -- descriptive name for the experiment

flagKey -- the feature flag being experimented on

metrics -- array of metric objects with key and isGroup fields

treatments -- array of treatments, each with a name, baseline flag, and parameters

holdout (optional) -- percentage of traffic to exclude

{

  "projectKey": "my-project",

  "environmentKey": "production",

  "name": "Checkout Flow v2 Experiment",

  "flagKey": "checkout-flow-v2",

  "metrics": [

    {"key": "checkout-completed", "isGroup": false},

    {"key": "checkout-time-seconds", "isGroup": false}

  ],

  "treatments": [

    {

      "name": "Control",

      "baseline": true,

      "parameters": {

        "flagKey": "checkout-flow-v2",

        "variationId": "variation-a-id"

      }

    },

    {

      "name": "New Checkout",

      "baseline": false,

      "parameters": {

        "flagKey": "checkout-flow-v2",

        "variationId": "variation-b-id"

      }

    }

  ]

}

Step 3: Start Data Collection

Use start-experiment-iteration to begin collecting data:

{

  "projectKey": "my-project",

  "environmentKey": "production",

  "experimentKey": "checkout-flow-v2-experiment"

}

Optionally set reshuffle: true to redistribute traffic across treatments.

Step 4: Verify

Use get-experiment to confirm the experiment is running

Check that all treatments are listed correctly

Verify metrics are attached

Confirm the iteration status shows as active

Report results:

Experiment created and iteration started

N treatments with M metrics configured

Data collection is active

Edge Cases

Situation

Action

Metric doesn't exist

Create it first with create-metric

Flag has no variations

Create flag variations before setting up treatments

Experiment already exists

Use list-experiments to find it, then get-experiment for details

Need to change metrics mid-experiment

Stop the current iteration, update, then start a new one

What NOT to Do

Don't start an experiment without clearly defined metrics

Don't stop experiments too early — wait for statistical significance

Don't run multiple experiments on the same flag simultaneously without careful holdout design

Don't forget to set a baseline treatment — one treatment must be marked baseline: true