Name: clickhouse-best-practices
Author: clickhouse

SKILL.md

$2a

Agent Connectivity & Query Workflow

Before querying ClickHouse, agents must establish a connection and follow the discovery workflow:

rules/agent-connect-mcp.md - Connection setup (MCP + CLI), credential discovery, output format selection

rules/agent-discovery-schema.md - CRITICAL: 7-step schema discovery workflow

rules/agent-query-safety.md - CRITICAL: LIMIT, timeouts, progressive exploration

Every agent session should follow this sequence:

Connect — establish connection via MCP or CLI (see agent-connect-mcp)

Discover — databases → tables → columns + comments → sort keys → skip indexes → sample → EXPLAIN

Plan — use sort key and skip index knowledge to write efficient WHERE clauses

Execute — run queries with LIMIT and timeouts

Recover — on timeout/memory errors, narrow filters and retry (see agent-query-safety)

Subagent architecture notes

If your system dispatches ClickHouse tasks to specialized subagents:

Schema discovery + query execution: any model — the steps are procedural

EXPLAIN analysis + query optimization: benefits from mid-tier reasoning

Schema design review against all 28 rules: benefits from mid-tier reasoning

Review Procedures

For Schema Reviews (CREATE TABLE, ALTER TABLE)

Read these rule files in order:

rules/schema-pk-plan-before-creation.md - ORDER BY is immutable

rules/schema-pk-cardinality-order.md - Column ordering in keys

rules/schema-pk-prioritize-filters.md - Filter column inclusion

rules/schema-types-native-types.md - Proper type selection

rules/schema-types-minimize-bitwidth.md - Numeric type sizing

rules/schema-types-lowcardinality.md - LowCardinality usage

rules/schema-types-avoid-nullable.md - Nullable vs DEFAULT

rules/schema-partition-low-cardinality.md - Partition count limits

rules/schema-partition-lifecycle.md - Partitioning purpose

Check for:

PRIMARY KEY / ORDER BY column order (low-to-high cardinality)

Data types match actual data ranges

LowCardinality applied to appropriate string columns

Partition key cardinality bounded (100-1,000 values)

ReplacingMergeTree has version column if used

For Query Reviews (SELECT, JOIN, aggregations)

Read these rule files:

rules/query-join-choose-algorithm.md - Algorithm selection

rules/query-join-filter-before.md - Pre-join filtering

rules/query-join-use-any.md - ANY vs regular JOIN

rules/query-index-skipping-indices.md - Secondary index usage

rules/schema-pk-filter-on-orderby.md - Filter alignment with ORDER BY

Check for:

Filters use ORDER BY prefix columns

JOINs filter tables before joining (not after)

Correct JOIN algorithm for table sizes

Skipping indices for non-ORDER BY filter columns

For Insert Strategy Reviews (data ingestion, updates, deletes)

Read these rule files:

rules/insert-batch-size.md - Batch sizing requirements

rules/insert-mutation-avoid-update.md - UPDATE alternatives

rules/insert-mutation-avoid-delete.md - DELETE alternatives

rules/insert-async-small-batches.md - Async insert usage

rules/insert-optimize-avoid-final.md - OPTIMIZE TABLE risks

Check for:

Batch size 10K-100K rows per INSERT

No ALTER TABLE UPDATE for frequent changes

ReplacingMergeTree or CollapsingMergeTree for update patterns

Async inserts enabled for high-frequency small batches

Output Format

Structure your response as follows:

## Rules Checked

- `rule-name-1` - Compliant / Violation found

- `rule-name-2` - Compliant / Violation found

...

## Findings

### Violations

- **`rule-name`**: Description of the issue

  - Current: [what the code does]

  - Required: [what it should do]

  - Fix: [specific correction]

### Compliant

- `rule-name`: Brief note on why it's correct

## Recommendations

[Prioritized list of changes, citing rules]

Rule Categories by Priority

Priority

Quick Reference

Schema Design - Primary Key (CRITICAL)

schema-pk-plan-before-creation - Plan ORDER BY before table creation (immutable)

schema-pk-cardinality-order - Order columns low-to-high cardinality

schema-pk-prioritize-filters - Include frequently filtered columns

schema-pk-filter-on-orderby - Query filters must use ORDER BY prefix

Schema Design - Data Types (CRITICAL)

schema-types-native-types - Use native types, not String for everything

schema-types-minimize-bitwidth - Use smallest numeric type that fits

schema-types-lowcardinality - LowCardinality for <10K unique strings

schema-types-enum - Enum for finite value sets with validation

schema-types-avoid-nullable - Avoid Nullable; use DEFAULT instead

Schema Design - Partitioning (HIGH)

schema-partition-low-cardinality - Keep partition count 100-1,000

schema-partition-lifecycle - Use partitioning for data lifecycle, not queries

schema-partition-query-tradeoffs - Understand partition pruning trade-offs

schema-partition-start-without - Consider starting without partitioning

Schema Design - JSON (MEDIUM)

schema-json-when-to-use - JSON for dynamic schemas; typed columns for known

Query Optimization - JOINs (CRITICAL)

query-join-choose-algorithm - Select algorithm based on table sizes

query-join-use-any - ANY JOIN when only one match needed

query-join-filter-before - Filter tables before joining

query-join-consider-alternatives - Dictionaries/denormalization vs JOIN

query-join-null-handling - join_use_nulls=0 for default values

Query Optimization - Indices (HIGH)

query-index-skipping-indices - Skipping indices for non-ORDER BY filters

Query Optimization - Materialized Views (HIGH)

query-mv-incremental - Incremental MVs for real-time aggregations

query-mv-refreshable - Refreshable MVs for complex joins

Insert Strategy - Batching (CRITICAL)

insert-batch-size - Batch 10K-100K rows per INSERT

Insert Strategy - Async (HIGH)

insert-async-small-batches - Async inserts for high-frequency small batches

insert-format-native - Native format for best performance

Insert Strategy - Mutations (CRITICAL)

insert-mutation-avoid-update - ReplacingMergeTree instead of ALTER UPDATE

insert-mutation-avoid-delete - Lightweight DELETE or DROP PARTITION

Insert Strategy - Optimization (HIGH)

insert-optimize-avoid-final - Let background merges work

Agent Integration - Discovery (CRITICAL)

agent-discovery-schema - Always discover schema before querying

Agent Integration - Safety (CRITICAL)

agent-query-safety - LIMIT, timeouts, progressive exploration

Agent Integration - Connectivity + Formats (HIGH)

agent-connect-mcp - MCP + CLI setup, credential discovery, output format selection

When to Apply

This skill activates when you encounter:

AI agent connecting to ClickHouse (MCP, CLI, HTTP)

Agent workflow design for ClickHouse

Schema discovery or exploration requests

CREATE TABLE statements

ALTER TABLE modifications

ORDER BY or PRIMARY KEY discussions

Data type selection questions

Slow query troubleshooting

JOIN optimization requests

Data ingestion pipeline design

Update/delete strategy questions

ReplacingMergeTree or other specialized engine usage

Partitioning strategy decisions

Rule File Structure

Each rule file in rules/ contains:

YAML frontmatter: title, impact level, tags

Brief explanation: Why this rule matters

Incorrect example: Anti-pattern with explanation

Correct example: Best practice with explanation

Additional context: Trade-offs, when to apply, references

Full Compiled Document

For the complete guide with all rules expanded inline: AGENTS.md

Use AGENTS.md when you need to check multiple rules quickly without reading individual files.

clickhouse-best-practices

SKILL.md

Agent Connectivity &#x26; Query Workflow

Subagent architecture notes

Review Procedures

For Schema Reviews (CREATE TABLE, ALTER TABLE)

For Query Reviews (SELECT, JOIN, aggregations)

For Insert Strategy Reviews (data ingestion, updates, deletes)

Output Format

Rule Categories by Priority

Quick Reference

Schema Design - Primary Key (CRITICAL)

Schema Design - Data Types (CRITICAL)

Schema Design - Partitioning (HIGH)

Schema Design - JSON (MEDIUM)

Query Optimization - JOINs (CRITICAL)

Query Optimization - Indices (HIGH)

Query Optimization - Materialized Views (HIGH)

Insert Strategy - Batching (CRITICAL)

Insert Strategy - Async (HIGH)

Insert Strategy - Mutations (CRITICAL)

Insert Strategy - Optimization (HIGH)

Agent Integration - Discovery (CRITICAL)

Agent Integration - Safety (CRITICAL)

Agent Integration - Connectivity + Formats (HIGH)

When to Apply

Rule File Structure

Full Compiled Document

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers

Agent Connectivity & Query Workflow