SKILL.md

$27

Index Optimization

Index Gap Analysis: Identification of missing indexes on foreign keys and query patterns

Composite Index Strategy: Optimal column ordering for multi-column indexes

Index Redundancy Detection: Elimination of overlapping and unused indexes

Performance Impact Modeling: Selectivity estimation and query cost analysis

Index Type Selection: B-tree, hash, partial, covering, and specialized indexes

Migration Management

Zero-Downtime Migrations: Expand-contract pattern implementation

Schema Evolution: Safe column additions, deletions, and type changes

Data Migration Scripts: Automated data transformation and validation

Rollback Strategy: Complete reversal capabilities with validation

Execution Planning: Ordered migration steps with dependency resolution

Database Design Principles

→ See references/database-design-reference.md for details

Best Practices

Schema Design

Use meaningful names: Clear, consistent naming conventions

Choose appropriate data types: Right-sized columns for storage efficiency

Define proper constraints: Foreign keys, check constraints, unique indexes

Consider future growth: Plan for scale from the beginning

Document relationships: Clear foreign key relationships and business rules

Performance Optimization

Index strategically: Cover common query patterns without over-indexing

Monitor query performance: Regular analysis of slow queries

Partition large tables: Improve query performance and maintenance

Use appropriate isolation levels: Balance consistency with performance

Implement connection pooling: Efficient resource utilization

Security Considerations

Principle of least privilege: Grant minimal necessary permissions

Encrypt sensitive data: At rest and in transit

Audit access patterns: Monitor and log database access

Validate inputs: Prevent SQL injection attacks

Regular security updates: Keep database software current

Query Generation Patterns

SELECT with JOINs

-- INNER JOIN: only matching rows

SELECT o.id, c.name, o.total

FROM orders o

INNER JOIN customers c ON c.id = o.customer_id;

-- LEFT JOIN: all left rows, NULLs for non-matches

SELECT c.name, COUNT(o.id) AS order_count

FROM customers c

LEFT JOIN orders o ON o.customer_id = c.id

GROUP BY c.name;

-- Self-join: hierarchical data (employees/managers)

SELECT e.name AS employee, m.name AS manager

FROM employees e

LEFT JOIN employees m ON m.id = e.manager_id;

Common Table Expressions (CTEs)

-- Recursive CTE for org chart

WITH RECURSIVE org AS (

  SELECT id, name, manager_id, 1 AS depth

  FROM employees WHERE manager_id IS NULL

  UNION ALL

  SELECT e.id, e.name, e.manager_id, o.depth + 1

  FROM employees e INNER JOIN org o ON o.id = e.manager_id

)

SELECT * FROM org ORDER BY depth, name;

Window Functions

-- ROW_NUMBER for pagination / dedup

SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn

FROM orders;

-- RANK with gaps, DENSE_RANK without gaps

SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM leaderboard;

-- LAG/LEAD for comparing adjacent rows

SELECT date, revenue,

  revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change

FROM daily_sales;

Aggregation Patterns

-- FILTER clause (PostgreSQL) for conditional aggregation

SELECT

  COUNT(*) AS total,

  COUNT(*) FILTER (WHERE status = 'active') AS active,

  AVG(amount) FILTER (WHERE amount > 0) AS avg_positive

FROM accounts;

-- GROUPING SETS for multi-level rollups

SELECT region, product, SUM(revenue)

FROM sales

GROUP BY GROUPING SETS ((region, product), (region), ());

Migration Patterns

Up/Down Migration Scripts

Every migration must have a reversible counterpart. Name files with a timestamp prefix for ordering:

migrations/

├── 20260101_000001_create_users.up.sql

├── 20260101_000001_create_users.down.sql

├── 20260115_000002_add_users_email_index.up.sql

└── 20260115_000002_add_users_email_index.down.sql

Zero-Downtime Migrations (Expand/Contract)

Use the expand-contract pattern to avoid locking or breaking running code:

Expand — add the new column/table (nullable, with default)

Migrate data — backfill in batches; dual-write from application

Transition — application reads from new column; stop writing to old

Contract — drop old column in a follow-up migration

Data Backfill Strategies

-- Batch update to avoid long-running locks

UPDATE users SET email_normalized = LOWER(email)

WHERE id IN (SELECT id FROM users WHERE email_normalized IS NULL LIMIT 5000);

-- Repeat in a loop until 0 rows affected

Rollback Procedures

Always test the down.sql in staging before deploying up.sql to production

Keep rollback window short — if the contract step has run, rollback requires a new forward migration

For irreversible changes (dropping columns with data), take a logical backup first

Performance Optimization

Indexing Strategies

Index Type

Use Case

Example

B-tree (default)

Equality, range, ORDER BY

CREATE INDEX idx_users_email ON users(email);

GIN

Full-text search, JSONB, arrays

CREATE INDEX idx_docs_body ON docs USING gin(to_tsvector('english', body));

GiST

Geometry, range types, nearest-neighbor

CREATE INDEX idx_locations ON places USING gist(coords);

Partial

Subset of rows (reduce size)

CREATE INDEX idx_active ON users(email) WHERE active = true;

Covering

Index-only scans

CREATE INDEX idx_cov ON orders(customer_id) INCLUDE (total, created_at);

EXPLAIN Plan Reading

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

Key signals to watch:

Seq Scan on large tables — missing index

Nested Loop with high row estimates — consider hash/merge join or add index

Buffers shared read much higher than hit — working set exceeds memory

N+1 Query Detection

Symptoms: application issues one query per row (e.g., fetching related records in a loop).

Fixes:

Use JOIN or subquery to fetch in one round-trip

ORM eager loading (select_related / includes / with)

DataLoader pattern for GraphQL resolvers

Connection Pooling

Tool

Protocol

Best For

PgBouncer

PostgreSQL

Transaction/statement pooling, low overhead

ProxySQL

MySQL

Query routing, read/write splitting

Built-in pool (HikariCP, SQLAlchemy pool)

Any

Application-level pooling

Rule of thumb: Set pool size to (2 * CPU cores) + disk spindles. For cloud SSDs, start with 2 * vCPUs and tune.

Read Replicas and Query Routing

Route all SELECT queries to replicas; writes to primary

Account for replication lag (typically <1s for async, 0 for sync)

Use pg_last_wal_replay_lsn() to detect lag before reading critical data

Multi-Database Decision Matrix

Criteria

PostgreSQL

MySQL

SQLite

SQL Server

Best for

Complex queries, JSONB, extensions

Web apps, read-heavy workloads

Embedded, dev/test, edge

Enterprise .NET stacks

JSON support

Excellent (JSONB + GIN)

Good (JSON type)

Minimal

Good (OPENJSON)

Replication

Streaming, logical

Group replication, InnoDB cluster

N/A

Always On AG

Licensing

Open source (PostgreSQL License)

Open source (GPL) / commercial

Public domain

Commercial

Max practical size

Multi-TB

~1 TB (single-writer)

Multi-TB

When to choose:

PostgreSQL — default choice for new projects; best extensibility and standards compliance

MySQL — existing MySQL ecosystem; simple read-heavy web applications

SQLite — mobile apps, CLI tools, unit test databases, IoT/edge

SQL Server — mandated by enterprise policy; deep .NET/Azure integration

NoSQL Considerations

Database

Model

Use When

MongoDB

Document

Schema flexibility, rapid prototyping, content management

Redis

Key-value / cache

Session store, rate limiting, leaderboards, pub/sub

DynamoDB

Wide-column

Serverless AWS apps, single-digit-ms latency at any scale

Use SQL as default. Reach for NoSQL only when the access pattern clearly benefits from it.

Sharding & Replication

Horizontal vs Vertical Partitioning

Vertical partitioning: Split columns across tables (e.g., separate BLOB columns). Reduces I/O for narrow queries.

Horizontal partitioning (sharding): Split rows across databases/servers. Required when a single node cannot hold the dataset or handle the throughput.

Sharding Strategies

Strategy

How It Works

Pros

Cons

Hash

shard = hash(key) % N

Even distribution

Resharding is expensive

Range

Shard by date or ID range

Simple, good for time-series

Hot spots on latest shard

Geographic

Shard by user region

Data locality, compliance

Cross-region queries are hard

Replication Patterns

Pattern

Consistency

Latency

Use Case

Synchronous

Strong

Higher write latency

Financial transactions

Asynchronous

Eventual

Low write latency

Read-heavy web apps

Semi-synchronous

At-least-one replica confirmed

Moderate

Balance of safety and speed

Cross-References

sql-database-assistant — query writing, optimization, and debugging for day-to-day SQL work

database-schema-designer — ERD modeling, normalization analysis, and schema generation

migration-architect — large-scale migration planning across database engines or major schema overhauls

senior-backend — application-layer patterns (connection pooling, ORM best practices)

senior-devops — infrastructure provisioning for database clusters and replicas

Conclusion

Effective database design requires balancing multiple competing concerns: performance, scalability, maintainability, and business requirements. This skill provides the tools and knowledge to make informed decisions throughout the database lifecycle, from initial schema design through production optimization and evolution.

The included tools automate common analysis and optimization tasks, while the comprehensive guides provide the theoretical foundation for making sound architectural decisions. Whether building a new system or optimizing an existing one, these resources provide expert-level guidance for creating robust, scalable database solutions.

database-designer

SKILL.md

Index Optimization

Migration Management

Database Design Principles

Best Practices

Schema Design

Performance Optimization

Security Considerations

Query Generation Patterns

SELECT with JOINs

Common Table Expressions (CTEs)

Window Functions

Aggregation Patterns

Migration Patterns

Up/Down Migration Scripts

Zero-Downtime Migrations (Expand/Contract)

Data Backfill Strategies

Rollback Procedures

Performance Optimization

Indexing Strategies

EXPLAIN Plan Reading

N+1 Query Detection

Connection Pooling

Read Replicas and Query Routing

Multi-Database Decision Matrix

NoSQL Considerations

Sharding & Replication

Horizontal vs Vertical Partitioning

Sharding Strategies

Replication Patterns

Cross-References

Conclusion

Stop writing automation&scrapers

database-designer

SKILL.md

Index Optimization

Migration Management

Database Design Principles

Best Practices

Schema Design

Performance Optimization

Security Considerations

Query Generation Patterns

SELECT with JOINs

Common Table Expressions (CTEs)

Window Functions

Aggregation Patterns

Migration Patterns

Up/Down Migration Scripts

Zero-Downtime Migrations (Expand/Contract)

Data Backfill Strategies

Rollback Procedures

Performance Optimization

Indexing Strategies

EXPLAIN Plan Reading

N+1 Query Detection

Connection Pooling

Read Replicas and Query Routing

Multi-Database Decision Matrix

NoSQL Considerations

Sharding &#x26; Replication

Horizontal vs Vertical Partitioning

Sharding Strategies

Replication Patterns

Cross-References

Conclusion

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers

Sharding & Replication