SKILL.md
$27
"ORIENT" [style=filled, fillcolor="#e8e8ff"];
"PLAN" [style=filled, fillcolor="#fff8e0"];
"IMPLEMENT" [style=filled, fillcolor="#ffe8e8"];
"VERIFY" [style=filled, fillcolor="#e8ffe8"];
"COMMIT" [style=filled, fillcolor="#e8e8ff"];
"ORIENT" -> "PLAN";
"PLAN" -> "IMPLEMENT";
"IMPLEMENT" -> "VERIFY";
"VERIFY" -> "IMPLEMENT" [label="fix", style=dashed];
"VERIFY" -> "COMMIT" [label="pass"];
"COMMIT" -> "IMPLEMENT" [label="next chunk", style=dashed];
}
**ORIENT.** Read existing code before touching anything. `Grep → Read → Read` is the dominant opening across the dataset. Sessions that read 10+ files before the first edit require fewer fix iterations downstream. Blind changes are the most expensive way to start.
**PLAN.** Scale-dependent (see below). Trivial fixes don't need a plan; features benefit from a task list; epics earn a research swarm. The decision is "what's proportional," not "always plan."
**IMPLEMENT.** Work in batches of roughly 2-3 edits, then verify. Follow the dependency chain. Edit existing files 9:1 over creating new ones; that's the observed ratio in successful sessions. Fix errors as they surface; accumulating them creates cascade-debugging.
**VERIFY.** Typecheck is the primary gate, fast and cheap; run it between batches. Tests fit naturally after feature-complete. Full suite before commit.
**COMMIT.** Atomic chunks, committed as you go. Verify, stage specific files, commit, loop back to the next chunk. Many small commits per session is the pattern that consistently outperforms one mega-commit at the end. See **Commit Cadence** below for message anatomy.
---
## Code Discipline
Principles that shape how you move through the loop, not steps to execute. Adapted from Karpathy's [observations](https://x.com/karpathy/status/2015883857489522876) on LLM coding pitfalls: models "make wrong assumptions on your behalf and just run along with them without checking ... really like to overcomplicate code and APIs, bloat abstractions ... implement a bloated construction over 1000 lines when 100 would do."
These principles bias toward caution because that's where models drift. For trivial fixes, the calculus inverts; apply judgment, not the full discipline.
### Think before coding
Don't assume. Don't hide confusion. Surface tradeoffs.
| Situation | Action |
| --------------------------------------- | --------------------------------- |
| Multiple interpretations of the request | Present them; don't pick silently |
| A simpler approach is plausible | Say so; push back when warranted |
| Something is unclear | Stop; name what's confusing; ask |
| You hold a load-bearing assumption | State it explicitly |
| Inconsistency between request and code | Surface it before proceeding |
ORIENT (read the code) is the prerequisite. This principle is what to do with what you find: name the gaps, don't paper over them.
### Simplicity first
Minimum code that solves the problem. Nothing speculative.
| Don't | Do |
| ---------------------------------------------- | -------------------------------------------- |
| Add features beyond what was asked | Solve exactly the stated problem |
| Build abstractions for single-use code | Inline first; abstract when reused |
| Add "flexibility" or configurability not asked | Hardcode now; parameterize on demand |
| Handle errors for impossible scenarios | Trust internal invariants; validate at edges |
| Write 200 lines when 50 would do | Rewrite tighter |
The test: would a senior engineer call this overcomplicated? If yes, simplify.
### Surgical changes
Touch only what you must. Clean up only your own mess.
| Rule | Why |
| ------------------------------------------------------ | ----------------------------------------------- |
| Don't "improve" adjacent code, comments, or formatting | Pollutes the diff; outside your scope |
| Don't refactor code that isn't broken | Scope creep expands blast radius |
| Match existing style even if you'd do it differently | Local consistency beats your preferences |
| Notice unrelated dead code → mention, don't delete | Other branches/agents may rely on it |
| Remove imports/vars/funcs _your_ changes orphaned | Clean up after yourself |
| Leave pre-existing dead code alone | Outside your remit unless explicitly asked |
| Don't touch comments you don't understand | Karpathy: "side effects ... orthogonal to task" |
The test: every changed line should trace directly to the user's request.
### Goal-driven execution
Define verifiable success. Loop until it passes.
| Vague task | Verifiable goal |
| ---------------- | --------------------------------------------------- |
| "Add validation" | Write tests for invalid inputs, then make them pass |
| "Fix the bug" | Write a test that reproduces it, then make it pass |
| "Refactor X" | Ensure the same tests pass before and after |
| "Make it work" | Reject, name the actual signal that proves it works |
For multi-step work, state the plan with verification per step:
- [Step] → verify: [check]
- [Step] → verify: [check]
- [Step] → verify: [check]
Strong success criteria let you loop independently. Weak criteria require constant clarification.
---
## Scale Selection
Strategy changes dramatically based on scope. Pick the right weight class:
| Scale | Edits | Strategy |
| -------------------------- | ------- | -------------------------------------------------------- |
| **Trivial** (config, typo) | 1-5 | Read -> Edit -> Verify -> Commit |
| **Small fix** | 5-20 | Grep error -> Read -> Fix -> Test -> Commit |
| **Feature** | 50-200 | Plan -> Layer-by-layer impl -> Verify per layer |
| **Subsystem** | 300-500 | Task planning -> Wave dispatch -> Layer-by-layer |
| **Epic** | 1000+ | Research swarm -> Spec -> Parallel agents -> Integration |
**Skip planning when:** Scope is clear, single-file change, fix describable in one sentence.
**Plan when:** Multiple files, unfamiliar code, uncertain approach.
---
## Dependency Chain
Build things in this order. Validated across fullstack, Rust, and monorepo projects:
Types/Models -> Backend Logic -> API Routes -> Frontend Types -> Hooks/Client -> UI Components -> Tests
**Fullstack (Python + TypeScript):**
1. Database model + migration
2. Service/business logic layer
3. API routes (FastAPI or tRPC)
4. Frontend API client
5. React hooks wrapping API calls
6. UI components consuming hooks
7. Lint -> typecheck -> test -> commit
**Rust:**
1. Error types (`thiserror` enum with `#[from]`)
2. Type definitions (structs, enums)
3. Core logic (`impl` blocks)
4. Module wiring (`mod.rs` re-exports)
5. `cargo check` -> `cargo clippy` -> `cargo test`
**Key finding:** Database migrations are written AFTER the code that needs them. Frontend drives backend changes as often as the reverse.
---
## Verification Cadence
The single most impactful practice in the dataset. Tight loops here make the rest of the skill mostly unnecessary; loose loops make every other guideline harder to follow.
| Gate | Typically | Speed |
| ---------------------- | -------------------------- | ------------------- |
| **Typecheck** | Between edit batches | Fast (primary gate) |
| **Lint (autofix)** | After implementation batch | Fast |
| **Tests (specific)** | After feature complete | Medium |
| **Tests (full suite)** | Before commit | Slow |
| **Build** | Before PR/deploy only | Slowest |
### The edit-verify-fix cycle
The pattern that consistently produces clean sessions: **~3 changes → verify → 1 fix**. Sweet spot, not a hard ratio. Sometimes one edit warrants its own verify, sometimes five edits group cleanly.
The pattern that produces debugging spirals: **2 changes → typecheck → 15 cascading fixes**. Prevent type cascades by grepping consumers before modifying shared types; once the cascade starts, separate fix domains (see Error Recovery).
**Combined gates save time:** `turbo lint:fix typecheck --filter=pkg` runs both in one shot. Scope verification to affected packages, never the full monorepo.
**Practical tips:**
- Run `lint:fix` BEFORE `lint` check to reduce iterations
- `cargo check` over `cargo build` (2-3x faster, same error detection)
- Truncate verbose output: `2>&1 | tail -20`
- Wrap tests with timeout: `timeout 120 uv run pytest`
---
## Decision Trees
### Read vs Edit
Familiar file you edited this session?
Yes -> Edit directly (verify after)
No -> Read it this session?
Yes -> Edit
No -> Read first (79% of quick fixes start with reading)
### Subagents vs Direct Work
Self-contained with a clear deliverable?
Yes -> Produces verbose output (tests, logs, research)?
Yes -> Subagent (keeps context clean)
No -> Need frequent back-and-forth?
Yes -> Direct
No -> Subagent
No -> Direct (iterative refinement needs shared context)
### Refactoring Approach
Can changes be made incrementally?
Yes -> Move first, THEN consolidate (separate commits)
New code alongside old, remove old only after tests pass
No -> Analysis phase first (parallel review agents)
Gap analysis: old vs new function-by-function
Implement gaps as focused tasks
### Bug Fix vs Feature vs Refactor
| Type | Cadence | Typical Cycles |
| ------------ | ------------------------------------------------------------------------ | -------------- |
| **Bug fix** | Grep error -> Read 2-5 files -> Edit 1-3 files -> Test -> Commit | 1-2 |
| **Feature** | Plan -> Models -> API -> Frontend -> Test -> Commit | 5-15 |
| **Refactor** | Audit -> Gap analysis -> Incremental migration -> Verify parity | 10-30+ |
| **Upgrade** | Research changelog -> Identify breaking changes -> Bump -> Fix consumers | Variable |
---
## Error Recovery
**65% of debugging sessions resolve in 1-2 iterations.** The remaining 35% risk spiraling into 6+. The patterns below are calibrated to keep you in the first bucket.
### Quick resolution
- Read the relevant code first (79% success correlation in the dataset)
- Form an explicit hypothesis: "the issue is X because Y"
- Make one targeted fix
- Verify the fix actually worked before moving on
### Spiral prevention
- **Separate error domains.** Fix all type errors before chasing test failures; interleaving the two is how cascades compound.
- **3-strike heuristic.** Three failed attempts on the same error is the signal to change approach entirely or escalate, not to try variation #4.
- **Cascade depth > 3.** Pause, enumerate all remaining issues, then fix in dependency order rather than reactive whack-a-mole.
- **Context rot.** After ~15-20 iterations, `/clear` and start fresh. A clean session with a better prompt usually outperforms accumulated corrections.
### Two-correction rule
Correcting the same issue twice is the signal to `/clear` and restart. Context noise compounds faster than fixes resolve it.
---
## Commit Cadence
Commit each logical chunk as it lands and verifies. Many small commits per session is the norm, never accumulate hours of unrelated work into one mega-commit. The COMMIT step loops back to IMPLEMENT for the next chunk.
### When to commit
| Trigger | Action |
| ----------------------------------------------- | --------------- |
| Logical chunk done, verification passes | Commit |
| Move/rename complete, before behavioral changes | Commit (move) |
| Behavioral change works after the move commit | Commit (change) |
| Refactor extracted, callers still pass | Commit |
| About to switch to a different concern | Commit current |
| Verification fails or edit is speculative | Don't commit |
If a reviewer would want it as a separate diff, it's a separate commit.
### Mirror local style
Before the first commit in any repo, run `git log -10 --oneline` and mirror the pattern. **Mirror format, not quality**, terse history doesn't lower your bar. Default to Conventional Commits when no pattern exists.
| Pattern | Example |
| -------------------- | ------------------------------ |
| Conventional Commits | `feat(api): add token refresh` |
| Gitmoji | `✨ Add token refresh` |
| Ticket prefix | `[ENG-1234] Add token refresh` |
| Module prefix | `auth: add token refresh` |
| Plain | `Add token refresh` |
Conventional Commit types: `feat` (capability), `fix` (bug), `refactor` (no behavior change), `perf`, `test`, `docs`, `style` (formatting only), `chore` (tooling/deps), `build`, `ci`. Scope is optional but encouraged when the change is localized.
### Message anatomy
**Subject:** imperative mood, ≤76 chars, no trailing period, no filenames. "Fix null deref in token refresh" beats "Fix bug." For Conventional Commits, no emoji in the subject, it breaks parsers.
**Body** (always include one): wrap at 76 chars, separated from subject by a blank line. Explain _why_, the diff shows _what_. State facts: banish "likely", "probably", "might", "seems", "appears to". If you don't know what a change does, read more before committing. Two sentences usually suffices; mention load-bearing context a future bisect would want.
### HEREDOC + Co-Author
Always pass messages via HEREDOC to preserve formatting. Add a `Co-Authored-By` trailer that names the model, "Claude" alone doesn't disambiguate across multi-agent sessions.
git commit -m "$(cat <<'EOF'
fix(auth): guard against null session in token refresh
Refresh racing with logout was dereferencing a freed session, surfacing
as a 500 with no log trail. Early return plus a single warn log makes
the failure mode visible without spamming on every refresh.
Co-Authored-By: Nova (Claude Opus 4.7) <noreply@anthropic.com>
EOF
)"
### Examples
Bad
Good
`fix: bug`
`fix(api): resolve null deref in token refresh`
`update stuff`
`chore(deps): bump axios to 1.7.4`
`WIP`
`feat(auth): scaffold magic-link sign-in flow`
`Added new file for users`
`feat(users): add bulk import endpoint`
`feat: it works now`
`feat(search): add fuzzy matching to user lookup`
### Multi-agent staging
Other agents may be working in parallel:
git status # See the full picture first
git diff --staged # Review what you're about to commit
git add <specific-files> # Only files you personally touched