SKILL.md
$29
- Capture execution context — runtime+version, provider(s), backend, execution path, environment criticality.
- Diagnose failure mode(s) using the routing table below. If intent spans categories, load both references.
- Load only the matching reference file(s) — do not preload depth the task does not need.
- Propose fix with risk controls — why this addresses the mode, what could still go wrong, guardrails (tests/approvals/rollback).
- Generate artifacts — HCL, migration blocks (
moved,import), CI changes, policy rules.
- Validate before finalizing — run validation commands tailored to risk tier.
- Emit the Response Contract at the end.
Diagnose Before You Generate
Failure category
Symptoms
Primary references
Identity churn
Resource addresses shift after refactor, count index churn, missing moved blocks
Code Patterns: count vs for_each, Code Patterns: moved blocks, Code Patterns: LLM mistakes
Secret exposure
Secrets in defaults, state, logs, CI artifacts
Security & Compliance, Code Patterns: write-only, State Management
Blast radius
Oversized stacks, shared prod/non-prod state, unsafe applies
State Management, Module Patterns
CI drift
Local plan ≠ CI plan, apply without reviewed artifact, unpinned versions
CI/CD Workflows, Code Patterns: versions
Compliance gaps
Missing policy stage, no approval model, no evidence retention
Security & Compliance, CI/CD Workflows
Testing blind spots
Plan-only validation of computed values, set-type indexing, mock/real confusion
State corruption / recovery
Stuck lock, backend migration, drift reconciliation
Provider upgrade risk
Breaking-change provider bump, unpinned modules
Code Patterns: versions, Module Patterns
Provider lifecycle
Removing a provider with resources still in state, orphaned resources, removed block usage
State Management: Provider Removal
Navigation / safe-rename blind spots
Cannot locate symbol defs/refs semantically, value-symbol rename done as blind text replace, grep-only refactor missing refs, hallucinated rg shim
When to Use This Skill
Activate when: creating or reviewing Terraform/OpenTofu configurations or modules, setting up or debugging tests, structuring multi-environment deployments, implementing IaC CI/CD, choosing module patterns or state organization, configuring or migrating remote state backends.
Don't use for: basic HCL syntax questions Claude already knows, provider API reference (link to docs), cloud-platform questions unrelated to Terraform/OpenTofu.
Core Principles
Module Hierarchy
Type
When to Use
Scope
Resource module
Single logical group of connected resources
VPC + subnets, SG + rules
Infrastructure module
Collection of resource modules for a purpose
Multiple resource modules in one region/account
Composition
Complete infrastructure
Spans multiple regions/accounts
Flow: resource → resource module → infrastructure module → composition.
Directory Layout
environments/ # prod/ staging/ dev/ — per-env configurations
modules/ # networking/ compute/ data/ — reusable modules
examples/ # minimal/ complete/ — docs + integration fixtures
Separate environments from modules. Use examples/ as both documentation and test fixtures. Keep modules small and single-responsibility.
See Module Patterns for architecture principles, naming conventions, variable/output contracts.
Naming Conventions (summary)
- Descriptive resource names (
aws_instance.web_server, notaws_instance.main)
- Reserve
thisfor genuine singleton resources only
- Prefix variables with context (
vpc_cidr_block, notcidr)
- Standard files:
main.tf,variables.tf,outputs.tf,versions.tf
See Module Patterns: Variable Naming and Code Patterns: Block Ordering for examples.
Block Ordering (summary)
Resource blocks: count/for_each first → arguments → tags → depends_on → lifecycle.
Variable blocks: description → type → default → validation → nullable → sensitive.
See Code Patterns: Block Ordering & Structure for the full rules and examples.
Testing Strategy
Decision Matrix: Which Testing Approach?
Situation
Approach
Tools
Cost
Quick syntax check
Static analysis
validate, fmt
Free
Pre-commit validation
Static + lint
validate, tflint, trivy, checkov
Free
Terraform 1.6+, simple logic
Native test framework
terraform test
Free-Low
Pre-1.6, or Go expertise
Integration testing
Terratest
Low-Med
Security/compliance focus
Policy as code
OPA, Sentinel
Free
Cost-sensitive workflow
Mock providers (1.7+)
Native tests + mocks
Free
Multi-cloud, complex
Full integration
Terratest + real infra
Med-High
Native Test Rules (1.6+)
Before writing test code: validate resource schemas via Terraform MCP so assertions target real attributes.
command = plan— fast, for input-derived values only
command = apply— required for computed values (ARNs, generated names) and set-type nested blocks
- Set-type blocks cannot be indexed with
[0]— useforexpressions or materialize viacommand = apply
- Common set types: S3 encryption rules, lifecycle transitions, IAM policy statements
See Testing Frameworks for static-analysis pipelines, native-test patterns, Terratest integration, mock providers, and the full LLM-mistake checklist.
Count vs For_Each — Quick Rule
Scenario
Use
Why
Boolean condition (create / don't)
count = condition ? 1 : 0
Optional singleton toggle
Items may be reordered or removed
for_each = toset(list)
Stable resource addresses
Reference by key
for_each = map
Named access
Multiple named resources
for_each
Better identity stability
Never use list index as long-lived identity — removing a middle element reshuffles every address after it. For the decision matrix, safe migration playbook, moved block patterns, and known-at-plan failure cases, see Code Patterns: count vs for_each.
Locals for Dependency Management
Using try() in a local to prefer a conditional resource's attribute over its parent is a specialized but high-value pattern — it forces correct deletion order without explicit depends_on. Common use: VPC + secondary CIDR associations + subnets.
See Code Patterns: Locals for Dependency Management for the full pattern and worked example.
Module Development
Standard layout:
my-module/
├── README.md # Usage documentation
├── main.tf # Primary resources
├── variables.tf # Typed inputs with descriptions
├── outputs.tf # Output values
├── versions.tf # required_version + required_providers
├── examples/
│ ├── minimal/
│ └── complete/
└── tests/
└── module_test.tftest.hcl # or Go for Terratest
Variable contracts: always description, always explicit type, use validation for complex constraints, use sensitive = true for secrets, prefer optional() with typed defaults (1.3+) over untyped map(any).
Output contracts: always description, mark sensitive outputs, expose stable subsets (not whole provider objects).
See Module Patterns for the full contract patterns, module release checklist, and LLM-mistake checklist.
CI/CD
Pipeline stages: validate → test → plan → apply (with environment protection).
Cost control: mock providers on PR validation, real-cloud integration only on main or scheduled, tag test resources, auto-cleanup.
Drift prevention: pin runtime and providers, commit .terraform.lock.hcl, apply the reviewed plan artifact from the plan stage (do not re-run plan inside the apply job), run policy/security stage on every path to apply.
See CI/CD Workflows for GitHub Actions, GitLab CI, and Atlantis templates plus the LLM-mistake checklist.
Security & Compliance
Essential checks:
trivy config .
checkov -d .
Don't: store secrets in variables or .tfvars, use default VPC, skip encryption, open security groups to 0.0.0.0/0, use inline ingress/egress blocks in aws_security_group.
Do: source secrets from AWS Secrets Manager / Parameter Store or use write_only arguments on 1.11+, create dedicated VPCs, enforce encryption at rest and TLS, least-privilege SGs, use separate aws_vpc_security_group_{ingress,egress}_rule resources (AWS provider v5+).
Marking a variable sensitive = true masks display only — the value still lives in state. Use write_only / *_wo on 1.11+, or keep secret material out of Terraform entirely via runtime lookups.
See Security & Compliance for trivy/checkov pipelines, state-file hardening, compliance mappings, and the LLM-mistake checklist.
State Management
Never use local state in teams or production. Remote backends provide automatic locking, encryption, versioning, audit logging, and safe collaboration.
Minimum Viable Backend (AWS S3, 1.10+)
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
encrypt = true
use_lockfile = true # Native S3 locking, 1.10+
}
}
On Terraform < 1.10, use dynamodb_table = "terraform-state-lock" instead of use_lockfile. Azure Storage, GCS, and Terraform Cloud all offer built-in locking — see the State Management reference for syntax.
State Organization
Pattern
Use When
Example Path
Per environment
Different teams per env
prod/terraform.tfstate, staging/...
Per component
Independent lifecycles
prod/vpc/, prod/eks/, prod/rds/
Hybrid (recommended)
Both benefits
prod/networking/, prod/compute/, staging/networking/
Split state when: different teams, different update cadences, or >500 resources. Combine when: tightly coupled resources, <100 resources, same lifecycle.
See State Management for locking, migration, multi-team isolation, disaster recovery, and the LLM-mistake checklist.
Version Management
Component
Strategy
Example
Terraform runtime
Pin minor
required_version = "~> 1.9"
Providers
Pin major
version = "~> 5.0"
Modules (prod)
Pin exact
version = "5.1.2"
Modules (dev)
Allow patch
version = "~> 5.1"
Commit .terraform.lock.hcl intentionally. Keep provider/runtime upgrades in a separate PR from functional changes. See Code Patterns: Version Management for constraint syntax and upgrade workflow.
Modern Terraform Features (1.0+)
Feature
Min version
Common use
try()
0.13+
Safe fallbacks, replaces element(concat())
nullable = false
1.1+
Prevent null silently overriding defaults
moved blocks
1.1+
Refactor without destroy/recreate
optional() with defaults
1.3+
Typed object attributes
import blocks
1.5+
Declarative imports, reviewable in VCS
check blocks
1.5+
Runtime assertions
Native terraform test
1.6+
Built-in test framework
Mock providers
1.7+
Cost-free unit testing
removed blocks
1.7+
Declarative resource removal
Provider-defined functions
1.8+
Provider-specific transformations (requires provider to declare functions)
Cross-variable validation
1.9+
Reference other var.* in validation blocks
write_only arguments
1.11+
Secrets never stored in state
S3 native lock-file
1.10+
State locking without DynamoDB
Before emitting a feature, verify the runtime floor. See Code Patterns: Feature Guard Table for the full table with common LLM error patterns per feature.
Runtime-Specific Guidance
- Terraform 1.0-1.5 / OpenTofu 1.0-1.5: Terratest for integration, static analysis + plan validation only (no native tests).
- 1.6+: native
terraform test/tofu testavailable — migrate simple unit tests, keep Terratest for complex integration.
- 1.7+: mock providers cut test cost — mock for unit tests, real runs for final integration.
- 1.10+: S3 native lock-file (
use_lockfile) is the correct default for new configurations — DynamoDB locking is no longer required.
- 1.11+:
write_onlyarguments for secret handling keep credentials out of state.
- Terraform vs OpenTofu: both supported. For licensing, governance, and feature delta, see Quick Reference: Terraform vs OpenTofu.
Code Intelligence (terraform-ls)
Semantic navigation for HCL. terraform-ls is optional; without it every row below degrades to a disclosed rg + Read fallback.
Self-contained terraform-ls layer of a generic code-intelligence discipline - apply the rows below directly. Recommended companion: the code-intelligence plugin (same antonbabenko/agent-plugins marketplace) carries the generic discipline (position anchoring, degradation gate, disclosure format, anti-phantom-shim) and ships /code-intelligence:doctor for readiness. If it is installed, defer to its generic protocol; this skill stays fully self-contained without it.
Goal
Use
Tradeoff
Find definition / all references
terraform-ls goToDefinition / findReferences
Needs init + a position anchor
Rename value symbol (var/local/output/provider alias)
Manual: findReferences -> per-file fresh Read -> edit -> validate
No rename provider
Rename resource/module address
moved block + plan shows 0 destroy
Text rename forces destroy/recreate
Exact text / known name / .tfvars / non-HCL
rg + Read
No semantic scope
✅ Supported: goToDefinition, findReferences, documentSymbol, hover, workspaceSymbol.
❌ Unsupported: goToImplementation, call hierarchy, rename provider. Do not call these then report their absence as a finding.
- ✅ Prereq: local
terraform/tofuon PATH,terraform initrun; cold start may need one retry.
- ✅ LSP calls are position-anchored (
file:line:character) - anchor withrgfirst, never symbol-name-only.
- ❌ Do not claim "LSP broken, using rg" until the Degradation Gate passes; disclose any tool substitution on the first line.
Depth: Code Intelligence.
Reference Files
Progressive disclosure — essentials here, depth on demand:
- Testing Frameworks — static analysis, native tests, Terratest, mock providers
- Module Patterns — structure, variable/output contracts,
terraform_remote_staterules, release checklist
- CI/CD Workflows — GitHub Actions, GitLab CI, Atlantis, cost control
- Security &#x26; Compliance — trivy/checkov, secrets handling, compliance mappings
- State Management — backends, locking, migration, multi-team, recovery
- Code Patterns — block ordering,
count/for_eachdeep dive, modern features, version management, locals
- Code Intelligence - terraform-ls capabilities, position-anchored calls, manual rename, degradation gate
- Quick Reference — command cheat sheets, flowcharts, troubleshooting
License
Apache License 2.0. See LICENSE for full terms.
Copyright © 2026 Anton Babenko