SKILL.md

E2E Behavior Validation for Frontend Modifications

Core Principle: Test Product Behavior, Not UI States

CRITICAL: Tests must verify that product features WORK correctly, not just that UI elements render.

What NOT to test (UI States):

❌ "Dropdown opens when clicked"

❌ "Modal appears after button click"

❌ "Loading spinner shows during request"

❌ "Form fields are visible"

❌ "Sidebar collapses"

What TO test (Product Behavior):

✅ "Selecting an LLM provider configures the agent to use that provider"

✅ "Creating a new agent persists it and shows in the agents list"

✅ "Running a tool with parameters returns the expected output"

✅ "Chat messages stream correctly and maintain conversation context"

✅ "Workflow execution triggers tools in the correct order"

Prerequisites

Requires Playwright MCP server. If the browser_navigate tool is unavailable, instruct the user to add it:

claude mcp add playwright -- npx @playwright/mcp@latest

Step 1: Understand the Feature Intent

Before writing ANY test, answer these questions:

What user problem does this feature solve?

What is the expected outcome when the feature works correctly?

What data flows through the system? (user input → API → state → UI)

What should persist after page reload?

What downstream effects should this action have?

Document these answers as comments in your test file.

Step 2: Build and Start

pnpm build:cli

cd packages/playground/e2e/kitchen-sink &#x26;&#x26; pnpm dev

Verify server at http://localhost:4111

Step 3: Map Feature to Behavior Tests

Feature-to-Test Mapping Guide

Feature Category

What to Test

Example Assertion

Agent Configuration

Config changes affect agent behavior

Send message → verify response uses selected model

LLM Provider Selection

Selected provider is used in requests

Intercept API call → verify provider in request payload

Tool Execution

Tool runs with correct params & returns result

Execute tool → verify output matches expected transformation

Workflow Execution

Steps execute in order, data flows between steps

Run workflow → verify each step's output feeds next step

Chat/Streaming

Messages persist, context maintained across turns

Multi-turn conversation → verify context awareness

MCP Server Tools

Server tools are callable and return data

Call MCP tool → verify response structure and content

Memory/Persistence

Data survives page reload

Create item → reload → verify item exists

Error Handling

Errors surface correctly to user

Trigger error condition → verify error message + recovery

Step 4: Write Behavior-Focused Tests

Test Structure Template

import { test, expect, Page } from '@playwright/test';

import { resetStorage } from '../__utils__/reset-storage';

import { selectFixture } from '../__utils__/select-fixture';

import { nanoid } from 'nanoid';

/**

 * FEATURE: [Name of feature]

 * USER STORY: As a user, I want to [action] so that [outcome]

 * BEHAVIOR UNDER TEST: [Specific behavior being validated]

 */

test.describe('[Feature Name] - Behavior Tests', () => {

  let page: Page;

  test.beforeEach(async ({ browser }) => {

    const context = await browser.newContext();

    page = await context.newPage();

  });

  test.afterEach(async () => {

    await resetStorage(page);

  });

  test('should [verb describing behavior] when [trigger condition]', async () => {

    // ARRANGE: Set up preconditions

    // - Navigate to the feature

    // - Configure any required state

    // ACT: Perform the user action that triggers the behavior

    // ASSERT: Verify the OUTCOME, not the UI state

    // - Check data persistence

    // - Verify downstream effects

    // - Confirm API calls made correctly

  });

});

Behavior Test Patterns

#### Pattern 1: Configuration Affects Behavior

test('selecting LLM provider should use that provider for agent responses', async () => {

  // ARRANGE

  await page.goto('/agents/my-agent/chat');

  // Intercept API to verify provider

  let capturedProvider: string | null = null;

  await page.route('**/api/chat', route => {

    const body = JSON.parse(route.request().postData() || '{}');

    capturedProvider = body.provider;

    route.continue();

  });

  // ACT: Select a different provider

  await page.getByTestId('provider-selector').click();

  await page.getByRole('option', { name: 'OpenAI' }).click();

  // Send a message to trigger the agent

  await page.getByTestId('chat-input').fill('Hello');

  await page.getByTestId('send-button').click();

  // ASSERT: Verify the selected provider was used

  await expect.poll(() => capturedProvider).toBe('openai');

});

#### Pattern 2: Data Persistence

test('created agent should persist after page reload', async () => {

  // ARRANGE

  await page.goto('/agents');

  const agentName = `Test Agent ${nanoid()}`;

  // ACT: Create new agent

  await page.getByTestId('create-agent-button').click();

  await page.getByTestId('agent-name-input').fill(agentName);

  await page.getByTestId('save-agent-button').click();

  // Wait for creation to complete

  await expect(page.getByText(agentName)).toBeVisible();

  // ASSERT: Verify persistence

  await page.reload();

  await expect(page.getByText(agentName)).toBeVisible({ timeout: 10000 });

});

#### Pattern 3: Tool Execution Produces Correct Output

test('weather tool should return formatted weather data', async () => {

  // ARRANGE

  await selectFixture(page, 'weather-success');

  await page.goto('/tools/weather-tool');

  // ACT: Execute tool with parameters

  await page.getByTestId('param-city').fill('San Francisco');

  await page.getByTestId('execute-tool-button').click();

  // ASSERT: Verify OUTPUT content, not just that output appears

  const output = page.getByTestId('tool-output');

  await expect(output).toContainText('temperature');

  await expect(output).toContainText('San Francisco');

  // Verify structured data if applicable

  const outputText = await output.textContent();

  const outputData = JSON.parse(outputText || '{}');

  expect(outputData).toHaveProperty('temperature');

  expect(outputData).toHaveProperty('conditions');

});

#### Pattern 4: Workflow Step Chaining

test('workflow should pass data between steps correctly', async () => {

  // ARRANGE

  await selectFixture(page, 'workflow-multi-step');

  const sessionId = nanoid();

  await page.goto(`/workflows/data-pipeline?session=${sessionId}`);

  // ACT: Trigger workflow execution

  await page.getByTestId('workflow-input').fill('test input data');

  await page.getByTestId('run-workflow-button').click();

  // ASSERT: Verify each step received correct input from previous step

  // Wait for completion

  await expect(page.getByTestId('workflow-status')).toHaveText('completed', { timeout: 30000 });

  // Check step outputs show data transformation chain

  const step1Output = await page.getByTestId('step-1-output').textContent();

  const step2Output = await page.getByTestId('step-2-output').textContent();

  // Verify step 2 received step 1's output as input

  expect(step2Output).toContain(step1Output);

});

#### Pattern 5: Streaming Chat with Context

test('chat should maintain conversation context across messages', async () => {

  // ARRANGE

  await selectFixture(page, 'contextual-chat');

  const chatId = nanoid();

  await page.goto(`/agents/assistant/chat/${chatId}`);

  // ACT: Multi-turn conversation

  await page.getByTestId('chat-input').fill('My name is Alice');

  await page.getByTestId('send-button').click();

  await expect(page.getByTestId('assistant-message').last()).toBeVisible({ timeout: 20000 });

  await page.getByTestId('chat-input').fill('What is my name?');

  await page.getByTestId('send-button').click();

  // ASSERT: Verify context was maintained

  const response = page.getByTestId('assistant-message').last();

  await expect(response).toContainText('Alice', { timeout: 20000 });

});

#### Pattern 6: Error Recovery

test('should show actionable error and allow retry when API fails', async () => {

  // ARRANGE: Set up failure fixture

  await selectFixture(page, 'api-failure');

  await page.goto('/tools/flaky-tool');

  // ACT: Trigger the error

  await page.getByTestId('execute-tool-button').click();

  // ASSERT: Error is shown with recovery option

  await expect(page.getByTestId('error-message')).toContainText('failed');

  await expect(page.getByTestId('retry-button')).toBeVisible();

  // Switch to success fixture and retry

  await selectFixture(page, 'api-success');

  await page.getByTestId('retry-button').click();

  // Verify recovery worked

  await expect(page.getByTestId('tool-output')).toBeVisible({ timeout: 10000 });

  await expect(page.getByTestId('error-message')).not.toBeVisible();

});

Step 5: Update Existing Tests

When a test file already exists:

Read the existing tests to understand current coverage

Identify if tests are UI-focused or behavior-focused

Refactor UI-focused tests to verify behavior instead:

Refactoring Example

BEFORE (UI-focused):

test('dropdown opens when clicked', async () => {

  await page.getByTestId('model-dropdown').click();

  await expect(page.getByRole('listbox')).toBeVisible();

});

AFTER (Behavior-focused):

test('selecting model from dropdown updates agent configuration', async () => {

  // Open dropdown and select model

  await page.getByTestId('model-dropdown').click();

  await page.getByRole('option', { name: 'GPT-4' }).click();

  // Verify the selection persists and affects behavior

  await page.reload();

  await expect(page.getByTestId('model-dropdown')).toHaveText('GPT-4');

  // Optionally: verify the model is used in actual requests

  // (via request interception or checking response metadata)

});

Step 6: Kitchen-Sink Fixtures for Behavior Testing

Fixtures should represent realistic scenarios, not just mock data:

Fixture Naming Convention

<feature>-<scenario>.fixture.ts

Examples:

- agent-with-tools.fixture.ts

- chat-multi-turn-context.fixture.ts

- workflow-parallel-execution.fixture.ts

- tool-validation-error.fixture.ts

- mcp-server-timeout.fixture.ts

Fixture Content Requirements

Each fixture must define:

Scenario description (what behavior it enables testing)

Expected outcomes (what assertions should pass)

Edge cases covered (error states, empty states, etc.)

// fixtures/agent-provider-switch.fixture.ts

export const agentProviderSwitch = {

  name: 'agent-provider-switch',

  description: 'Tests that switching LLM providers changes agent behavior',

  // Mock responses for different providers

  responses: {

    openai: { content: 'Response from OpenAI', model: 'gpt-4' },

    anthropic: { content: 'Response from Anthropic', model: 'claude-3' },

  },

  expectedBehavior: {

    // When provider is switched, subsequent messages use new provider

    providerSwitchAffectsNextMessage: true,

    // Provider selection persists across page reload

    providerPersistsOnReload: true,

  },

};

Step 7: Run and Validate

cd packages/playground &#x26;&#x26; pnpm test:e2e

Test Quality Checklist

Before considering tests complete, verify:

Each test has a clear user story comment

Tests verify OUTCOMES, not intermediate UI states

Tests would FAIL if the feature broke (not just if UI changed)

Persistence is verified via page.reload() where applicable

Error scenarios are covered

Tests use appropriate timeouts for async operations

Fixtures represent realistic usage scenarios

Quick Reference

Step

Command/Action

Build

pnpm build:cli

Start

cd packages/playground/e2e/kitchen-sink && pnpm dev

App URL

http://localhost:4111

Routes

@packages/playground/src/App.tsx

Run tests

cd packages/playground && pnpm test:e2e

Test dir

packages/playground/e2e/tests/

Fixtures

packages/playground/e2e/kitchen-sink/fixtures/

Anti-Patterns to Avoid

❌ Don't

✅ Do Instead

Test that modal opens

Test that modal action completes and persists

Test that button is clickable

Test that clicking button produces expected result

Test loading spinner appears

Test that loaded data is correct

Test form validation message shows

Test that invalid form cannot submit AND valid form succeeds

Test dropdown has options

Test that selecting option changes system behavior

Test sidebar navigation works

Test that navigated page has correct data/functionality

Assert element is visible

Assert element contains expected data/state

e2e-tests-studio

SKILL.md

E2E Behavior Validation for Frontend Modifications

Core Principle: Test Product Behavior, Not UI States

What NOT to test (UI States):

What TO test (Product Behavior):

Prerequisites

Step 1: Understand the Feature Intent

Step 2: Build and Start

Step 3: Map Feature to Behavior Tests

Feature-to-Test Mapping Guide

Step 4: Write Behavior-Focused Tests

Test Structure Template

Behavior Test Patterns

Step 5: Update Existing Tests

Refactoring Example

Step 6: Kitchen-Sink Fixtures for Behavior Testing

Fixture Naming Convention

Fixture Content Requirements

Step 7: Run and Validate

Test Quality Checklist

Quick Reference

Anti-Patterns to Avoid

Stop writing automation&scrapers

e2e-tests-studio

SKILL.md

E2E Behavior Validation for Frontend Modifications

Core Principle: Test Product Behavior, Not UI States

What NOT to test (UI States):

What TO test (Product Behavior):

Prerequisites

Step 1: Understand the Feature Intent

Step 2: Build and Start

Step 3: Map Feature to Behavior Tests

Feature-to-Test Mapping Guide

Step 4: Write Behavior-Focused Tests

Test Structure Template

Behavior Test Patterns

Step 5: Update Existing Tests

Refactoring Example

Step 6: Kitchen-Sink Fixtures for Behavior Testing

Fixture Naming Convention

Fixture Content Requirements

Step 7: Run and Validate

Test Quality Checklist

Quick Reference

Anti-Patterns to Avoid

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers