SKILL.md

Database Schema Designer

Design production-ready database schemas with best practices built-in.

Quick Start

Just describe your data model:

design a schema for an e-commerce platform with users, products, orders

You'll get a complete SQL schema like:

CREATE TABLE users (

  id BIGINT AUTO_INCREMENT PRIMARY KEY,

  email VARCHAR(255) UNIQUE NOT NULL,

  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

);

CREATE TABLE orders (

  id BIGINT AUTO_INCREMENT PRIMARY KEY,

  user_id BIGINT NOT NULL REFERENCES users(id),

  total DECIMAL(10,2) NOT NULL,

  INDEX idx_orders_user (user_id)

);

What to include in your request:

Entities (users, products, orders)

Key relationships (users have orders, orders have items)

Scale hints (high-traffic, millions of records)

Database preference (SQL/NoSQL) - defaults to SQL if not specified

Triggers

Trigger

Example

design schema

"design a schema for user authentication"

database design

"database design for multi-tenant SaaS"

create tables

"create tables for a blog system"

schema for

"schema for inventory management"

model data

"model data for real-time analytics"

I need a database

"I need a database for tracking orders"

design NoSQL

"design NoSQL schema for product catalog"

Key Terms

Term

Definition

Normalization

Organizing data to reduce redundancy (1NF → 2NF → 3NF)

3NF

Third Normal Form - no transitive dependencies between columns

OLTP

Online Transaction Processing - write-heavy, needs normalization

OLAP

Online Analytical Processing - read-heavy, benefits from denormalization

Foreign Key (FK)

Column that references another table's primary key

Index

Data structure that speeds up queries (at cost of slower writes)

Access Pattern

How your app reads/writes data (queries, joins, filters)

Denormalization

Intentionally duplicating data to speed up reads

Quick Reference

Task

Approach

Key Consideration

New schema

Normalize to 3NF first

Domain modeling over UI

SQL vs NoSQL

Access patterns decide

Read/write ratio matters

Primary keys

INT or UUID

UUID for distributed systems

Foreign keys

Always constrain

ON DELETE strategy critical

Indexes

FKs + WHERE columns

Column order matters

Migrations

Always reversible

Backward compatible first

Process Overview

Your Data Requirements

    |

    v

+-----------------------------------------------------+

| Phase 1: ANALYSIS                                   |

| * Identify entities and relationships               |

| * Determine access patterns (read vs write heavy)   |

| * Choose SQL or NoSQL based on requirements         |

+-----------------------------------------------------+

    |

    v

+-----------------------------------------------------+

| Phase 2: DESIGN                                     |

| * Normalize to 3NF (SQL) or embed/reference (NoSQL) |

| * Define primary keys and foreign keys              |

| * Choose appropriate data types                     |

| * Add constraints (UNIQUE, CHECK, NOT NULL)         |

+-----------------------------------------------------+

    |

    v

+-----------------------------------------------------+

| Phase 3: OPTIMIZE                                   |

| * Plan indexing strategy                            |

| * Consider denormalization for read-heavy queries   |

| * Add timestamps (created_at, updated_at)           |

+-----------------------------------------------------+

    |

    v

+-----------------------------------------------------+

| Phase 4: MIGRATE                                    |

| * Generate migration scripts (up + down)            |

| * Ensure backward compatibility                     |

| * Plan zero-downtime deployment                     |

+-----------------------------------------------------+

    |

    v

Production-Ready Schema

Commands

Command

When to Use

Action

design schema for {domain}

Starting fresh

Full schema generation

normalize {table}

Fixing existing table

Apply normalization rules

add indexes for {table}

Performance issues

Generate index strategy

migration for {change}

Schema evolution

Create reversible migration

review schema

Code review

Audit existing schema

Workflow: Start with design schema → iterate with normalize → optimize with add indexes → evolve with migration

Core Principles

Principle

WHY

Implementation

Model the Domain

UI changes, domain doesn't

Entity names reflect business concepts

Data Integrity First

Corruption is costly to fix

Constraints at database level

Optimize for Access Pattern

Can't optimize for both

OLTP: normalized, OLAP: denormalized

Plan for Scale

Retrofitting is painful

Index strategy + partitioning plan

Anti-Patterns

Avoid

Why

Instead

VARCHAR(255) everywhere

Wastes storage, hides intent

Size appropriately per field

FLOAT for money

Rounding errors

DECIMAL(10,2)

Missing FK constraints

Orphaned data

Always define foreign keys

No indexes on FKs

Slow JOINs

Index every foreign key

Storing dates as strings

Can't compare/sort

DATE, TIMESTAMP types

SELECT * in queries

Fetches unnecessary data

Explicit column lists

Non-reversible migrations

Can't rollback

Always write DOWN migration

Adding NOT NULL without default

Breaks existing rows

Add nullable, backfill, then constrain

Verification Checklist

After designing a schema:

Every table has a primary key

All relationships have foreign key constraints

ON DELETE strategy defined for each FK

Indexes exist on all foreign keys

Indexes exist on frequently queried columns

Appropriate data types (DECIMAL for money, etc.)

NOT NULL on required fields

UNIQUE constraints where needed

CHECK constraints for validation

created_at and updated_at timestamps

Migration scripts are reversible

Tested on staging with production data

Normal Forms

Form

Rule

Violation Example

1NF

Atomic values, no repeating groups

product_ids = '1,2,3'

2NF

1NF + no partial dependencies

customer_name in order_items

3NF

2NF + no transitive dependencies

country derived from postal_code

1st Normal Form (1NF)

-- BAD: Multiple values in column

CREATE TABLE orders (

  id INT PRIMARY KEY,

  product_ids VARCHAR(255)  -- '101,102,103'

);

-- GOOD: Separate table for items

CREATE TABLE orders (

  id INT PRIMARY KEY,

  customer_id INT

);

CREATE TABLE order_items (

  id INT PRIMARY KEY,

  order_id INT REFERENCES orders(id),

  product_id INT

);

2nd Normal Form (2NF)

-- BAD: customer_name depends only on customer_id

CREATE TABLE order_items (

  order_id INT,

  product_id INT,

  customer_name VARCHAR(100),  -- Partial dependency!

  PRIMARY KEY (order_id, product_id)

);

-- GOOD: Customer data in separate table

CREATE TABLE customers (

  id INT PRIMARY KEY,

  name VARCHAR(100)

);

3rd Normal Form (3NF)

-- BAD: country depends on postal_code

CREATE TABLE customers (

  id INT PRIMARY KEY,

  postal_code VARCHAR(10),

  country VARCHAR(50)  -- Transitive dependency!

);

-- GOOD: Separate postal_codes table

CREATE TABLE postal_codes (

  code VARCHAR(10) PRIMARY KEY,

  country VARCHAR(50)

);

When to Denormalize

Scenario

Denormalization Strategy

Read-heavy reporting

Pre-calculated aggregates

Expensive JOINs

Cached derived columns

Analytics dashboards

Materialized views

-- Denormalized for performance

CREATE TABLE orders (

  id INT PRIMARY KEY,

  customer_id INT,

  total_amount DECIMAL(10,2),  -- Calculated

  item_count INT               -- Calculated

);

String Types

Type

Use Case

Example

CHAR(n)

Fixed length

State codes, ISO dates

VARCHAR(n)

Variable length

Names, emails

TEXT

Long content

Articles, descriptions

-- Good sizing

email VARCHAR(255)

phone VARCHAR(20)

country_code CHAR(2)

Numeric Types

Type

Range

Use Case

TINYINT

-128 to 127

Age, status codes

SMALLINT

-32K to 32K

Quantities

INT

-2.1B to 2.1B

IDs, counts

BIGINT

Very large

Large IDs, timestamps

DECIMAL(p,s)

Exact precision

Money

FLOAT/DOUBLE

Approximate

Scientific data

-- ALWAYS use DECIMAL for money

price DECIMAL(10, 2)  -- $99,999,999.99

-- NEVER use FLOAT for money

price FLOAT  -- Rounding errors!

Date/Time Types

DATE        -- 2025-10-31

TIME        -- 14:30:00

DATETIME    -- 2025-10-31 14:30:00

TIMESTAMP   -- Auto timezone conversion

-- Always store in UTC

created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP

updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP

Boolean

-- PostgreSQL

is_active BOOLEAN DEFAULT TRUE

-- MySQL

is_active TINYINT(1) DEFAULT 1

When to Create Indexes

Always Index

Reason

Foreign keys

Speed up JOINs

WHERE clause columns

Speed up filtering

ORDER BY columns

Speed up sorting

Unique constraints

Enforced uniqueness

-- Foreign key index

CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Query pattern index

CREATE INDEX idx_orders_status_date ON orders(status, created_at);

Index Types

Type

Best For

Example

B-Tree

Ranges, equality

price > 100

Hash

Exact matches only

email = 'x@y.com'

Full-text

Text search

MATCH AGAINST

Partial

Subset of rows

WHERE is_active = true

Composite Index Order

CREATE INDEX idx_customer_status ON orders(customer_id, status);

-- Uses index (customer_id first)

SELECT * FROM orders WHERE customer_id = 123;

SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';

-- Does NOT use index (status alone)

SELECT * FROM orders WHERE status = 'pending';

Rule: Most selective column first, or column most queried alone.

Index Pitfalls

Pitfall

Problem

Solution

Over-indexing

Slow writes

Only index what's queried

Wrong column order

Unused index

Match query patterns

Missing FK indexes

Slow JOINs

Always index FKs

Primary Keys

-- Auto-increment (simple)

id INT AUTO_INCREMENT PRIMARY KEY

-- UUID (distributed systems)

id CHAR(36) PRIMARY KEY DEFAULT (UUID())

-- Composite (junction tables)

PRIMARY KEY (student_id, course_id)

Foreign Keys

FOREIGN KEY (customer_id) REFERENCES customers(id)

  ON DELETE CASCADE     -- Delete children with parent

  ON DELETE RESTRICT    -- Prevent deletion if referenced

  ON DELETE SET NULL    -- Set to NULL when parent deleted

  ON UPDATE CASCADE     -- Update children when parent changes

Strategy

Use When

CASCADE

Dependent data (order_items)

RESTRICT

Important references (prevent accidents)

SET NULL

Optional relationships

Other Constraints

-- Unique

email VARCHAR(255) UNIQUE NOT NULL

-- Composite unique

UNIQUE (student_id, course_id)

-- Check

price DECIMAL(10,2) CHECK (price >= 0)

discount INT CHECK (discount BETWEEN 0 AND 100)

-- Not null

name VARCHAR(100) NOT NULL

One-to-Many

CREATE TABLE orders (

  id INT PRIMARY KEY,

  customer_id INT NOT NULL REFERENCES customers(id)

);

CREATE TABLE order_items (

  id INT PRIMARY KEY,

  order_id INT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,

  product_id INT NOT NULL,

  quantity INT NOT NULL

);

Many-to-Many

-- Junction table

CREATE TABLE enrollments (

  student_id INT REFERENCES students(id) ON DELETE CASCADE,

  course_id INT REFERENCES courses(id) ON DELETE CASCADE,

  enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,

  PRIMARY KEY (student_id, course_id)

);

Self-Referencing

CREATE TABLE employees (

  id INT PRIMARY KEY,

  name VARCHAR(100) NOT NULL,

  manager_id INT REFERENCES employees(id)

);

Polymorphic

-- Approach 1: Separate FKs (stronger integrity)

CREATE TABLE comments (

  id INT PRIMARY KEY,

  content TEXT NOT NULL,

  post_id INT REFERENCES posts(id),

  photo_id INT REFERENCES photos(id),

  CHECK (

    (post_id IS NOT NULL AND photo_id IS NULL) OR

    (post_id IS NULL AND photo_id IS NOT NULL)

  )

);

-- Approach 2: Type + ID (flexible, weaker integrity)

CREATE TABLE comments (

  id INT PRIMARY KEY,

  content TEXT NOT NULL,

  commentable_type VARCHAR(50) NOT NULL,

  commentable_id INT NOT NULL

);

Embedding vs Referencing

Factor

Embed

Reference

Access pattern

Read together

Read separately

Relationship

1:few

1:many

Document size

Small

Approaching 16MB

Update frequency

Rarely

Frequently

Embedded Document

{

  "_id": "order_123",

  "customer": {

    "id": "cust_456",

    "name": "Jane Smith",

    "email": "jane@example.com"

  },

  "items": [

    { "product_id": "prod_789", "quantity": 2, "price": 29.99 }

  ],

  "total": 109.97

}

Referenced Document

{

  "_id": "order_123",

  "customer_id": "cust_456",

  "item_ids": ["item_1", "item_2"],

  "total": 109.97

}

MongoDB Indexes

// Single field

db.users.createIndex({ email: 1 }, { unique: true });

// Composite

db.orders.createIndex({ customer_id: 1, created_at: -1 });

// Text search

db.articles.createIndex({ title: "text", content: "text" });

// Geospatial

db.stores.createIndex({ location: "2dsphere" });

Migration Best Practices

Practice

WHY

Always reversible

Need to rollback

Backward compatible

Zero-downtime deploys

Schema before data

Separate concerns

Test on staging

Catch issues early

Adding a Column (Zero-Downtime)

-- Step 1: Add nullable column

ALTER TABLE users ADD COLUMN phone VARCHAR(20);

-- Step 2: Deploy code that writes to new column

-- Step 3: Backfill existing rows

UPDATE users SET phone = '' WHERE phone IS NULL;

-- Step 4: Make required (if needed)

ALTER TABLE users MODIFY phone VARCHAR(20) NOT NULL;

Renaming a Column (Zero-Downtime)

-- Step 1: Add new column

ALTER TABLE users ADD COLUMN email_address VARCHAR(255);

-- Step 2: Copy data

UPDATE users SET email_address = email;

-- Step 3: Deploy code reading from new column

-- Step 4: Deploy code writing to new column

-- Step 5: Drop old column

ALTER TABLE users DROP COLUMN email;

Migration Template

-- Migration: YYYYMMDDHHMMSS_description.sql

-- UP

BEGIN;

ALTER TABLE users ADD COLUMN phone VARCHAR(20);

CREATE INDEX idx_users_phone ON users(phone);

COMMIT;

-- DOWN

BEGIN;

DROP INDEX idx_users_phone ON users;

ALTER TABLE users DROP COLUMN phone;

COMMIT;

Query Analysis

EXPLAIN SELECT * FROM orders

WHERE customer_id = 123 AND status = 'pending';

Look For

Meaning

type: ALL

Full table scan (bad)

type: ref

Index used (good)

key: NULL

No index used

rows: high

Many rows scanned

N+1 Query Problem

# BAD: N+1 queries

orders = db.query("SELECT * FROM orders")

for order in orders:

    customer = db.query(f"SELECT * FROM customers WHERE id = {order.customer_id}")

# GOOD: Single JOIN

results = db.query("""

    SELECT orders.*, customers.name

    FROM orders

    JOIN customers ON orders.customer_id = customers.id

""")

Optimization Techniques

Technique

When to Use

Add indexes

Slow WHERE/ORDER BY

Denormalize

Expensive JOINs

Pagination

Large result sets

Caching

Repeated queries

Read replicas

Read-heavy load

Partitioning

Very large tables

Extension Points

Database-Specific Patterns: Add MySQL vs PostgreSQL vs SQLite variations

Advanced Patterns: Time-series, event sourcing, CQRS, multi-tenancy

ORM Integration: TypeORM, Prisma, SQLAlchemy patterns

Monitoring: Query performance tracking, slow query alerts

database-schema-designer

SKILL.md

Database Schema Designer

Quick Start

Triggers

Key Terms

Quick Reference

Process Overview

Commands

Core Principles

Anti-Patterns

Verification Checklist

Normal Forms

1st Normal Form (1NF)

2nd Normal Form (2NF)

3rd Normal Form (3NF)

When to Denormalize

String Types

Numeric Types

Date/Time Types

Boolean

When to Create Indexes

Index Types

Composite Index Order

Index Pitfalls

Primary Keys

Foreign Keys

Other Constraints

One-to-Many

Many-to-Many

Self-Referencing

Polymorphic

Embedding vs Referencing

Embedded Document

Referenced Document

MongoDB Indexes

Migration Best Practices

Adding a Column (Zero-Downtime)

Renaming a Column (Zero-Downtime)

Migration Template

Query Analysis

N+1 Query Problem

Optimization Techniques

Extension Points

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers