SKILL.md

TDD Guide

Test-driven development skill for generating tests, analyzing coverage, and guiding red-green-refactor workflows across Jest, Pytest, JUnit, and Vitest.

Workflows

Generate Tests from Code

Provide source code (TypeScript, JavaScript, Python, Java)

Specify target framework (Jest, Pytest, JUnit, Vitest)

Run test_generator.py with requirements

Review generated test stubs

Validation: Tests compile and cover happy path, error cases, edge cases

Analyze Coverage Gaps

Generate coverage report from test runner (npm test -- --coverage)

Run coverage_analyzer.py on LCOV/JSON/XML report

Review prioritized gaps (P0/P1/P2)

Generate missing tests for uncovered paths

Validation: Coverage meets target threshold (typically 80%+)

TDD New Feature

Write failing test first (RED)

Run tdd_workflow.py --phase red to validate

Implement minimal code to pass (GREEN)

Run tdd_workflow.py --phase green to validate

Refactor while keeping tests green (REFACTOR)

Validation: All tests pass after each cycle

Examples

Test Generation — Input → Output (Pytest)

**Input source function (math_utils.py):**

def divide(a: float, b: float) -> float:

    if b == 0:

        raise ValueError("Cannot divide by zero")

    return a / b

Command:

python scripts/test_generator.py --input math_utils.py --framework pytest

**Generated test output (test_math_utils.py):**

import pytest

from math_utils import divide

class TestDivide:

    def test_divide_positive_numbers(self):

        assert divide(10, 2) == 5.0

    def test_divide_negative_numerator(self):

        assert divide(-10, 2) == -5.0

    def test_divide_float_result(self):

        assert divide(1, 3) == pytest.approx(0.333, rel=1e-3)

    def test_divide_by_zero_raises_value_error(self):

        with pytest.raises(ValueError, match="Cannot divide by zero"):

            divide(10, 0)

    def test_divide_zero_numerator(self):

        assert divide(0, 5) == 0.0

Coverage Analysis — Sample P0/P1/P2 Output

Command:

python scripts/coverage_analyzer.py --report lcov.info --threshold 80

Sample output:

Coverage Report — Overall: 63% (threshold: 80%)

P0 — Critical gaps (uncovered error paths):

  auth/login.py:42-58   handle_expired_token()       0% covered

  payments/process.py:91-110  handle_payment_failure()   0% covered

P1 — High-value gaps (core logic branches):

  users/service.py:77   update_profile() — else branch  0% covered

  orders/cart.py:134    apply_discount() — zero-qty guard  0% covered

P2 — Low-risk gaps (utility / helper functions):

  utils/formatting.py:12  format_currency()            0% covered

Recommended: Generate tests for P0 items first to reach 80% threshold.

Key Tools

Tool

Purpose

Usage

test_generator.py

Generate test cases from code/requirements

python scripts/test_generator.py --input source.py --framework pytest

coverage_analyzer.py

Parse and analyze coverage reports

python scripts/coverage_analyzer.py --report lcov.info --threshold 80

tdd_workflow.py

Guide red-green-refactor cycles

python scripts/tdd_workflow.py --phase red --test test_auth.py

fixture_generator.py

Generate test data and mocks

python scripts/fixture_generator.py --entity User --count 5

Additional scripts: framework_adapter.py (convert between frameworks), metrics_calculator.py (quality metrics), format_detector.py (detect language/framework), output_formatter.py (CLI/desktop/CI output).

Input Requirements

For Test Generation:

Source code (file path or pasted content)

Target framework (Jest, Pytest, JUnit, Vitest)

Coverage scope (unit, integration, edge cases)

For Coverage Analysis:

Coverage report file (LCOV, JSON, or XML format)

Optional: Source code for context

Optional: Target threshold percentage

For TDD Workflow:

Feature requirements or user story

Current phase (RED, GREEN, REFACTOR)

Test code and implementation status

Spec-First Workflow

TDD is most effective when driven by a written spec. The flow:

Write or receive a spec — stored in specs/<feature>.md

Extract acceptance criteria — each criterion becomes one or more test cases

Write failing tests (RED) — one test per acceptance criterion

Implement minimal code (GREEN) — satisfy each test in order

Refactor — clean up while all tests stay green

Spec Directory Convention

project/

├── specs/

│   ├── user-auth.md          # Feature spec with acceptance criteria

│   ├── payment-processing.md

│   └── notification-system.md

├── tests/

│   ├── test_user_auth.py     # Tests derived from specs/user-auth.md

│   ├── test_payments.py

│   └── test_notifications.py

└── src/

Extracting Tests from Specs

Each acceptance criterion in a spec maps to at least one test:

Spec Criterion

Test Case

"User can log in with valid credentials"

test_login_valid_credentials_returns_token

"Invalid password returns 401"

test_login_invalid_password_returns_401

"Account locks after 5 failed attempts"

test_login_locks_after_five_failures

Tip: Number your acceptance criteria in the spec. Reference the number in the test docstring for traceability (# AC-3: Account locks after 5 failed attempts).

Cross-reference: See engineering/spec-driven-workflow for the full spec methodology, including spec templates and review checklists.

Red-Green-Refactor Examples Per Language

TypeScript / Jest

// test/cart.test.ts

describe("Cart", () => {

  describe("addItem", () => {

    it("should add a new item to an empty cart", () => {

      const cart = new Cart();

      cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 1 });

      expect(cart.items).toHaveLength(1);

      expect(cart.items[0].id).toBe("sku-1");

    });

    it("should increment quantity when adding an existing item", () => {

      const cart = new Cart();

      cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 1 });

      cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 2 });

      expect(cart.items).toHaveLength(1);

      expect(cart.items[0].qty).toBe(3);

    });

    it("should throw when quantity is zero or negative", () => {

      const cart = new Cart();

      expect(() =>

        cart.addItem({ id: "sku-1", name: "Widget", price: 9.99, qty: 0 })

      ).toThrow("Quantity must be positive");

    });

  });

});

Python / Pytest (Advanced Patterns)

# tests/conftest.py — shared fixtures

import pytest

from app.db import create_engine, Session

@pytest.fixture(scope="session")

def db_engine():

    engine = create_engine("sqlite:///:memory:")

    yield engine

    engine.dispose()

@pytest.fixture

def db_session(db_engine):

    session = Session(bind=db_engine)

    yield session

    session.rollback()

    session.close()

# tests/test_pricing.py — parametrize for multiple cases

import pytest

from app.pricing import calculate_discount

@pytest.mark.parametrize("subtotal, expected_discount", [

    (50.0, 0.0),       # Below threshold — no discount

    (100.0, 5.0),      # 5% tier

    (250.0, 25.0),     # 10% tier

    (500.0, 75.0),     # 15% tier

])

def test_calculate_discount(subtotal, expected_discount):

    assert calculate_discount(subtotal) == pytest.approx(expected_discount)

Go — Table-Driven Tests

// cart_test.go

package cart

import "testing"

func TestApplyDiscount(t *testing.T) {

    tests := []struct {

        name     string

        subtotal float64

        want     float64

    }{

        {"no discount below threshold", 50.0, 0.0},

        {"5 percent tier", 100.0, 5.0},

        {"10 percent tier", 250.0, 25.0},

        {"15 percent tier", 500.0, 75.0},

        {"zero subtotal", 0.0, 0.0},

    }

    for _, tt := range tests {

        t.Run(tt.name, func(t *testing.T) {

            got := ApplyDiscount(tt.subtotal)

            if got != tt.want {

                t.Errorf("ApplyDiscount(%v) = %v, want %v", tt.subtotal, got, tt.want)

            }

        })

    }

}

Bounded Autonomy Rules

When generating tests autonomously, follow these rules to decide when to stop and ask the user:

Stop and Ask When

Ambiguous requirements — the spec or user story has conflicting or unclear acceptance criteria

Missing edge cases — you cannot determine boundary values without domain knowledge (e.g., max allowed transaction amount)

Test count exceeds 50 — large test suites need human review before committing; present a summary and ask which areas to prioritize

External dependencies unclear — the feature relies on third-party APIs or services with undocumented behavior

Security-sensitive logic — authentication, authorization, encryption, or payment flows require human sign-off on test scenarios

Continue Autonomously When

Clear spec with numbered acceptance criteria — each criterion maps directly to tests

Straightforward CRUD operations — create, read, update, delete with well-defined models

Well-defined API contracts — OpenAPI spec or typed interfaces available

Pure functions — deterministic input/output with no side effects

Existing test patterns — the codebase already has similar tests to follow

Property-Based Testing

Property-based testing generates random inputs to verify invariants instead of relying on hand-picked examples. Use it when the input space is large and the expected behavior can be described as a property.

Python — Hypothesis

from hypothesis import given, strategies as st

from app.serializers import serialize, deserialize

@given(st.text())

def test_roundtrip_serialization(data):

    """Serialization followed by deserialization returns the original."""

    assert deserialize(serialize(data)) == data

@given(st.integers(), st.integers())

def test_addition_is_commutative(a, b):

    assert a + b == b + a

TypeScript — fast-check

import fc from "fast-check";

import { encode, decode } from "./codec";

test("encode/decode roundtrip", () => {

  fc.assert(

    fc.property(fc.string(), (input) => {

      expect(decode(encode(input))).toBe(input);

    })

  );

});

When to Use Property-Based Over Example-Based

Use Property-Based

Example

Data transformations

Serialize/deserialize roundtrips

Mathematical properties

Commutativity, associativity, idempotency

Encoding/decoding

Base64, URL encoding, compression

Sorting and filtering

Output is sorted, length preserved

Parser correctness

Valid input always parses without error

Mutation Testing

Mutation testing modifies your production code (creates "mutants") and checks whether your tests catch the changes. If a mutant survives (tests still pass), your tests have a gap that coverage alone cannot reveal.

Tools

Language

Tool

Command

TypeScript/JavaScript

Stryker

npx stryker run

Python

mutmut

mutmut run --paths-to-mutate=src/

Java

PIT

mvn org.pitest:pitest-maven:mutationCoverage

Why Mutation Testing Matters

100% line coverage != good tests — coverage tells you code was executed, not that it was verified

Catches weak assertions — tests that run code but assert nothing meaningful

Finds missing boundary tests — mutants that change < to <= expose off-by-one gaps

Quantifiable quality metric — mutation score (% mutants killed) is a stronger signal than coverage %

Recommendation: Run mutation testing on critical paths (auth, payments, data processing) even if overall coverage is high. Target 85%+ mutation score on P0 modules.

Cross-References

Skill

Relationship

engineering/spec-driven-workflow

Spec → acceptance criteria → test extraction pipeline

engineering-team/focused-fix

Phase 5 (Verify) uses TDD to confirm the fix with a regression test

engineering-team/senior-qa

Broader QA strategy; TDD is one layer in the test pyramid

engineering-team/code-reviewer

Review generated tests for assertion quality and coverage completeness

engineering-team/senior-fullstack

Project scaffolders include testing infrastructure compatible with TDD workflows

Limitations

Scope

Details

Unit test focus

Integration and E2E tests require different patterns

Static analysis

Cannot execute tests or measure runtime behavior

Language support

Best for TypeScript, JavaScript, Python, Java

Report formats

LCOV, JSON, XML only; other formats need conversion

Generated tests

Provide scaffolding; require human review for complex logic

When to use other tools:

E2E testing: Playwright, Cypress, Selenium

Performance testing: k6, JMeter, Locust

Security testing: OWASP ZAP, Burp Suite

tdd-guide

SKILL.md

TDD Guide

Workflows

Generate Tests from Code

Analyze Coverage Gaps

TDD New Feature

Examples

Test Generation — Input → Output (Pytest)

Coverage Analysis — Sample P0/P1/P2 Output

Key Tools

Input Requirements

Spec-First Workflow

Spec Directory Convention

Extracting Tests from Specs

Red-Green-Refactor Examples Per Language

TypeScript / Jest

Python / Pytest (Advanced Patterns)

Go — Table-Driven Tests

Bounded Autonomy Rules

Stop and Ask When

Continue Autonomously When

Property-Based Testing

Python — Hypothesis

TypeScript — fast-check

When to Use Property-Based Over Example-Based

Mutation Testing

Tools

Why Mutation Testing Matters

Cross-References

Limitations

Stop writing automation&scrapers

tdd-guide

SKILL.md

TDD Guide

Workflows

Generate Tests from Code

Analyze Coverage Gaps

TDD New Feature

Examples

Test Generation — Input → Output (Pytest)

Coverage Analysis — Sample P0/P1/P2 Output

Key Tools

Input Requirements

Spec-First Workflow

Spec Directory Convention

Extracting Tests from Specs

Red-Green-Refactor Examples Per Language

TypeScript / Jest

Python / Pytest (Advanced Patterns)

Go — Table-Driven Tests

Bounded Autonomy Rules

Stop and Ask When

Continue Autonomously When

Property-Based Testing

Python — Hypothesis

TypeScript — fast-check

When to Use Property-Based Over Example-Based

Mutation Testing

Tools

Why Mutation Testing Matters

Cross-References

Limitations

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers