SKILL.md
$2a
Agent Connectivity & Query Workflow
Before querying ClickHouse, agents must establish a connection and follow the discovery workflow:
rules/agent-connect-mcp.md- Connection setup (MCP + CLI), credential discovery, output format selection
rules/agent-discovery-schema.md- CRITICAL: 7-step schema discovery workflow
rules/agent-query-safety.md- CRITICAL: LIMIT, timeouts, progressive exploration
Every agent session should follow this sequence:
- Connect — establish connection via MCP or CLI (see
agent-connect-mcp)
- Discover — databases → tables → columns + comments → sort keys → skip indexes → sample → EXPLAIN
- Plan — use sort key and skip index knowledge to write efficient WHERE clauses
- Execute — run queries with LIMIT and timeouts
- Recover — on timeout/memory errors, narrow filters and retry (see
agent-query-safety)
Subagent architecture notes
If your system dispatches ClickHouse tasks to specialized subagents:
- Schema discovery + query execution: any model — the steps are procedural
- EXPLAIN analysis + query optimization: benefits from mid-tier reasoning
- Schema design review against all 28 rules: benefits from mid-tier reasoning
Review Procedures
For Schema Reviews (CREATE TABLE, ALTER TABLE)
Read these rule files in order:
rules/schema-pk-plan-before-creation.md- ORDER BY is immutable
rules/schema-pk-cardinality-order.md- Column ordering in keys
rules/schema-pk-prioritize-filters.md- Filter column inclusion
rules/schema-types-native-types.md- Proper type selection
rules/schema-types-minimize-bitwidth.md- Numeric type sizing
rules/schema-types-lowcardinality.md- LowCardinality usage
rules/schema-types-avoid-nullable.md- Nullable vs DEFAULT
rules/schema-partition-low-cardinality.md- Partition count limits
rules/schema-partition-lifecycle.md- Partitioning purpose
Check for:
- PRIMARY KEY / ORDER BY column order (low-to-high cardinality)
- Data types match actual data ranges
- LowCardinality applied to appropriate string columns
- Partition key cardinality bounded (100-1,000 values)
- ReplacingMergeTree has version column if used
For Query Reviews (SELECT, JOIN, aggregations)
Read these rule files:
rules/query-join-choose-algorithm.md- Algorithm selection
rules/query-join-filter-before.md- Pre-join filtering
rules/query-join-use-any.md- ANY vs regular JOIN
rules/query-index-skipping-indices.md- Secondary index usage
rules/schema-pk-filter-on-orderby.md- Filter alignment with ORDER BY
Check for:
- Filters use ORDER BY prefix columns
- JOINs filter tables before joining (not after)
- Correct JOIN algorithm for table sizes
- Skipping indices for non-ORDER BY filter columns
For Insert Strategy Reviews (data ingestion, updates, deletes)
Read these rule files:
rules/insert-batch-size.md- Batch sizing requirements
rules/insert-mutation-avoid-update.md- UPDATE alternatives
rules/insert-mutation-avoid-delete.md- DELETE alternatives
rules/insert-async-small-batches.md- Async insert usage
rules/insert-optimize-avoid-final.md- OPTIMIZE TABLE risks
Check for:
- Batch size 10K-100K rows per INSERT
- No ALTER TABLE UPDATE for frequent changes
- ReplacingMergeTree or CollapsingMergeTree for update patterns
- Async inserts enabled for high-frequency small batches
Output Format
Structure your response as follows:
## Rules Checked
- `rule-name-1` - Compliant / Violation found
- `rule-name-2` - Compliant / Violation found
...
## Findings
### Violations
- **`rule-name`**: Description of the issue
- Current: [what the code does]
- Required: [what it should do]
- Fix: [specific correction]
### Compliant
- `rule-name`: Brief note on why it's correct
## Recommendations
[Prioritized list of changes, citing rules]
Rule Categories by Priority
Priority
Category
Impact
Prefix
Rule Count
1
Primary Key Selection
CRITICAL
schema-pk-
4
2
Data Type Selection
CRITICAL
schema-types-
5
3
JOIN Optimization
CRITICAL
query-join-
5
4
Insert Batching
CRITICAL
insert-batch-
1
5
Mutation Avoidance
CRITICAL
insert-mutation-
2
6
Partitioning Strategy
HIGH
schema-partition-
4
7
Skipping Indices
HIGH
query-index-
1
8
Materialized Views
HIGH
query-mv-
2
9
Async Inserts
HIGH
insert-async-
2
10
OPTIMIZE Avoidance
HIGH
insert-optimize-
1
11
JSON Usage
MEDIUM
schema-json-
1
12
Agent Schema Discovery
CRITICAL
agent-discovery-
1
13
Agent Query Safety
CRITICAL
agent-query-
1
14
Agent Connectivity + Formats
HIGH
agent-connect-
1
Quick Reference
Schema Design - Primary Key (CRITICAL)
schema-pk-plan-before-creation- Plan ORDER BY before table creation (immutable)
schema-pk-cardinality-order- Order columns low-to-high cardinality
schema-pk-prioritize-filters- Include frequently filtered columns
schema-pk-filter-on-orderby- Query filters must use ORDER BY prefix
Schema Design - Data Types (CRITICAL)
schema-types-native-types- Use native types, not String for everything
schema-types-minimize-bitwidth- Use smallest numeric type that fits
schema-types-lowcardinality- LowCardinality for <10K unique strings
schema-types-enum- Enum for finite value sets with validation
schema-types-avoid-nullable- Avoid Nullable; use DEFAULT instead
Schema Design - Partitioning (HIGH)
schema-partition-low-cardinality- Keep partition count 100-1,000
schema-partition-lifecycle- Use partitioning for data lifecycle, not queries
schema-partition-query-tradeoffs- Understand partition pruning trade-offs
schema-partition-start-without- Consider starting without partitioning
Schema Design - JSON (MEDIUM)
schema-json-when-to-use- JSON for dynamic schemas; typed columns for known
Query Optimization - JOINs (CRITICAL)
query-join-choose-algorithm- Select algorithm based on table sizes
query-join-use-any- ANY JOIN when only one match needed
query-join-filter-before- Filter tables before joining
query-join-consider-alternatives- Dictionaries/denormalization vs JOIN
query-join-null-handling- join_use_nulls=0 for default values
Query Optimization - Indices (HIGH)
query-index-skipping-indices- Skipping indices for non-ORDER BY filters
Query Optimization - Materialized Views (HIGH)
query-mv-incremental- Incremental MVs for real-time aggregations
query-mv-refreshable- Refreshable MVs for complex joins
Insert Strategy - Batching (CRITICAL)
insert-batch-size- Batch 10K-100K rows per INSERT
Insert Strategy - Async (HIGH)
insert-async-small-batches- Async inserts for high-frequency small batches
insert-format-native- Native format for best performance
Insert Strategy - Mutations (CRITICAL)
insert-mutation-avoid-update- ReplacingMergeTree instead of ALTER UPDATE
insert-mutation-avoid-delete- Lightweight DELETE or DROP PARTITION
Insert Strategy - Optimization (HIGH)
insert-optimize-avoid-final- Let background merges work
Agent Integration - Discovery (CRITICAL)
agent-discovery-schema- Always discover schema before querying
Agent Integration - Safety (CRITICAL)
agent-query-safety- LIMIT, timeouts, progressive exploration
Agent Integration - Connectivity + Formats (HIGH)
agent-connect-mcp- MCP + CLI setup, credential discovery, output format selection
When to Apply
This skill activates when you encounter:
-
AI agent connecting to ClickHouse (MCP, CLI, HTTP)
-
Agent workflow design for ClickHouse
-
Schema discovery or exploration requests
-
CREATE TABLE statements
-
ALTER TABLE modifications
-
ORDER BY or PRIMARY KEY discussions
-
Data type selection questions
-
Slow query troubleshooting
-
JOIN optimization requests
-
Data ingestion pipeline design
-
Update/delete strategy questions
-
ReplacingMergeTree or other specialized engine usage
-
Partitioning strategy decisions
Rule File Structure
Each rule file in rules/ contains:
- YAML frontmatter: title, impact level, tags
- Brief explanation: Why this rule matters
- Incorrect example: Anti-pattern with explanation
- Correct example: Best practice with explanation
- Additional context: Trade-offs, when to apply, references
Full Compiled Document
For the complete guide with all rules expanded inline: AGENTS.md
Use AGENTS.md when you need to check multiple rules quickly without reading individual files.