experiment-design

Design experiment plans with progressive stages — initial implementation, baseline tuning, creative research, and ablation studies. Plan baselines, datasets,…

INSTALLATION
npx skills add https://github.com/lingzhi227/agent-research-skills --skill experiment-design
Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Generates baselines, ablation matrix, hyperparameter grid, metric selection. Stdlib-only.

4-Stage Progressive Framework (from AI-Scientist-v2)

Stage 1: Initial Implementation

  • Focus on getting a basic working implementation
  • Use a simple dataset
  • Aim for basic functional correctness
  • Completion: at least one working (non-buggy) implementation

Stage 2: Baseline Tuning

  • Tune hyperparameters (learning rate, epochs, batch size)
  • Do NOT change model architecture
  • Test on at least TWO datasets
  • Completion: stable training curves, improvement over Stage 1

Stage 3: Creative Research

  • Explore novel improvements and insights
  • Be creative and think outside the box
  • Test on at least THREE datasets
  • Completion: demonstrated novel improvement

Stage 4: Ablation Studies

  • Systematic component analysis
  • Each ablation tests a different aspect
  • Use same datasets as Stage 3
  • Completion: all planned ablations done

Output Format

{

  "stages": [

    {

      "name": "initial_implementation",

      "goals": ["Basic working baseline", "Simple dataset"],

      "max_iterations": 5,

      "completion_criteria": "Working implementation with non-zero accuracy"

    }

  ],

  "baselines": ["Method A", "Method B"],

  "datasets": ["Dataset1", "Dataset2", "Dataset3"],

  "metrics": ["accuracy", "F1", "inference_time"],

  "ablation_components": ["component_A", "component_B"],

  "hyperparameter_grid": {

    "lr": [1e-4, 1e-3, 1e-2],

    "batch_size": [32, 64, 128]

  },

  "num_seeds": 3

}

Rules

  • Always start simple (Stage 1) before complex experiments
  • Each stage builds on the best result from the previous stage
  • Multi-seed evaluation for statistical significance
  • Document every experiment run in notes.txt
  • Generate figures for training curves and comparisons

Related Skills

BrowserAct

Let your agent run on any real-world website

Bypass CAPTCHA & anti-bot for free. Start local, scale to cloud.

Explore BrowserAct Skills →

Stop writing automation&scrapers

Install the CLI. Run your first Skill in 30 seconds. Scale when you're ready.

Start free
free · no credit card