SKILL.md

/printing-press-retro

Name: printing-press-retro
Author: mvanhorn

Analyze a Printing Press session to find ways to improve the system that produces

CLIs — the Go binary, templates, skills, and catalog. Not fixes to the specific CLI

that was just printed, but improvements so the next CLI comes out stronger.

**It is a non-goal for the Printing Press to produce flawless CLIs without manual

tweaks.** That's the nature of the system. We expect agents to reason over the

generated CLI, customize for the specific API, build novel features, and iterate.

Some hand-built work in every run is normal.

The retro's job is to find the subset of manual work where **the machine could

have realistically raised the floor** — given the agent a better starting point,

prevented the issue entirely, or eliminated friction that would recur on the

next CLI. Two clear cases qualify:

**The machine could have completely prevented the issue, and the pattern is

generalizable across many printed CLIs.** File it.

The machine could have raised the floor meaningfully — better default,

partial scaffold, helper that absorbs the boilerplate — **across multiple

CLIs you can name with evidence.** File it.

Otherwise, the manual work is normal iteration and should not generate a finding.

Some items make it back as machine fixes; not all. The retro is the filter that

distinguishes the two.

The retro creates a GitHub issue on the printing-press repo with the findings

that survive triage and the adversarial check, plus artifacts, so maintainers

(or an AI agent) can fix the Printing Press.

Terminology

The Printing Press: The whole system that produces CLIs. Use this name in all

user-facing output (issues, retros, prompts). It has four subsystems:

Generator — templates that emit Go code (internal/generator/)

Scorer — tools that grade the output: verify, dogfood, scorecard

Skills — SKILL.md instructions that guide Claude during generation

Binary — the Go CLI itself: commands, flags, parsers (cmd/printing-press/)

Printed CLI: A CLI produced by the Printing Press for a specific API (e.g.,

notion-pp-cli). Printed-CLI fixes only help that one CLI.

Use "the Printing Press" when talking about the system. Use the subsystem name when

pointing a developer at what to fix — "fix the scorer" and "fix the generator" are

different PRs.

Cardinal rules

Issue bodies and retro docs are public surfaces. Redact every real secret and PII before quoting. Manuscripts contain credentials, account identifiers, real emails, and live API response data — that's why references/secret-scrubbing.md scrubs them before artifact upload. Issue body text goes straight to a public GitHub issue, and the retro doc itself is preserved in manuscript proofs and may be uploaded as a zip. When you quote scanner output, dogfood payloads, Greptile review comments, or API response bodies as "evidence," replace the sensitive substring with <REDACTED:<kind>> BEFORE pasting. This applies hardest to findings about secret/PII leaks: the natural impulse is to quote the actual leaked value to prove the leak exists — that re-leaks it in a public issue. Phase 5 (retro doc write) and Phase 6 (pre-post scrub) enforce this mechanically; this rule is the human-readable charter behind the mechanical enforcement. See references/secret-scrubbing.md "Layer 0" for the redaction patterns and substitution shapes.

Default is "don't change the machine." The Printing Press is mature — 30+ CLIs printed, most templates exercised across many shapes. The burden of proof is on the finding, not on the Skip path. Most things you encountered while printing one CLI are that CLI's quirks, iteration noise, or upstream API behavior — not generator gaps. Propose a machine change only when cross-CLI evidence is concrete and the finding survives the Phase 3 adversarial check (Step G).

A retro of three sharp findings is more valuable than ten mixed-quality findings. Each filed finding spends maintainer attention. If you find yourself writing "every finding warrants action" or producing zero drops and zero skips, stop and re-triage — that outcome is the failure mode this skill exists to prevent.

The retro proposes Printing Press changes that help multiple printed CLIs. Don't propose direct edits to the one CLI that just shipped, and don't propose machine changes whose value is unique to this CLI's quirks — those are printed-CLI fixes wearing a generator costume.

Never upload un-scrubbed artifacts. All artifacts go through the secrets scrub before upload.

Never modify source directories. Manuscripts and library directories are read-only. Scrub operations work on temporary copies.

Never skip the secrets scrub, even if the generation pipeline already ran one. Defense in depth.

Never work around a scorer bug in the Printing Press. If a scoring tool penalizes something incorrectly, the fix goes in the scoring tool.

Setup

# Path-only setup — no binary detection required.

# The retro skill reads manuscripts and runs gh/curl. It does not invoke the

# printing-press binary. This avoids aborting for users who installed the

# plugin but not the Go binary.

_scope_dir="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"

_scope_dir="$(cd "$_scope_dir" &#x26;&#x26; pwd -P)"

PRESS_HOME="$HOME/printing-press"

PRESS_MANUSCRIPTS="$PRESS_HOME/manuscripts"

PRESS_LIBRARY="$PRESS_HOME/library"

RETRO_SCRATCH_DIR="/tmp/printing-press/retro"

mkdir -p "$PRESS_MANUSCRIPTS" "$PRESS_LIBRARY" "$RETRO_SCRATCH_DIR"

# Detect whether we're inside the printing-press repo

IN_REPO=false

if [ -f "$_scope_dir/cmd/printing-press/main.go" ]; then

  IN_REPO=true

  REPO_ROOT="$_scope_dir"

  echo "Running from printing-press repo: $REPO_ROOT"

fi

Guard rails

Nothing to retro

if [ ! -d "$PRESS_MANUSCRIPTS" ] || [ -z "$(ls -A "$PRESS_MANUSCRIPTS" 2>/dev/null)" ]; then

  echo "No manuscripts found. Run /printing-press first to generate a CLI."

  exit 1

fi

Resolve which API

If the user passed an API name as an argument, use that. Validate for path traversal:

# Reject names with /, \, or ..

if echo "$USER_API_NAME" | grep -qE '[/\\]|\.\.'; then

  echo "Invalid API name: '$USER_API_NAME'. Names cannot contain path separators or '..'."

  exit 1

fi

# Verify resolved path stays under PRESS_MANUSCRIPTS

RESOLVED="$(cd "$PRESS_MANUSCRIPTS/$USER_API_NAME" 2>/dev/null &#x26;&#x26; pwd -P)"

case "$RESOLVED" in

  "$PRESS_MANUSCRIPTS"/*) ;; # OK

  *) echo "Invalid API name: path resolves outside manuscripts directory."; exit 1 ;;

esac

If no API name was provided and multiple APIs exist, list them with their most recent

run dates and ask the user to choose:

echo "Multiple APIs found in manuscripts:"

for api_dir in "$PRESS_MANUSCRIPTS"/*/; do

  api_name=$(basename "$api_dir")

  latest=$(ls -t "$api_dir" 2>/dev/null | head -1)

  echo "  - $api_name (latest run: $latest)"

done

Use AskUserQuestion to let the user pick.

Resolve which run

If the API has multiple runs, default to the most recent. If the user specified a

run ID, use that. Otherwise:

API_DIR="$PRESS_MANUSCRIPTS/$API_NAME"

RUN_ID=$(ls -t "$API_DIR" 2>/dev/null | head -1)

RUN_DIR="$API_DIR/$RUN_ID"

echo "Retro for: $API_NAME (run $RUN_ID)"

echo "Manuscripts: $RUN_DIR"

Resolve CLI directory

API_SLUG="$API_NAME"

CLI_NAME="${API_SLUG}-pp-cli"

CLI_DIR="$PRESS_LIBRARY/$CLI_NAME"

if [ ! -d "$CLI_DIR" ]; then

  # Try without -pp-cli suffix (legacy naming)

  CLI_DIR="$PRESS_LIBRARY/$API_NAME"

fi

if [ ! -d "$CLI_DIR" ]; then

  echo "WARNING: CLI directory not found at $PRESS_LIBRARY/$CLI_NAME"

  echo "Proceeding with manuscripts only — CLI source will not be included in artifacts."

  CLI_DIR=""

fi

When to run

Best results come from running in the same conversation where the CLI was generated

(post-shipcheck) — the retro can mine the full conversation history for errors,

retries, manual edits, and discoveries.

If running in a fresh conversation, the retro proceeds with manuscript evidence only.

Phase 2 marks session-dependent findings as "evidence: manuscripts only."

Phase 1: Gather evidence

Read all artifacts from the run:

Research brief — $RUN_DIR/research/*brief*

Absorb manifest — $RUN_DIR/research/*absorb*

Shipcheck proof — $RUN_DIR/proofs/*shipcheck*

Build log — $RUN_DIR/proofs/*build-log* (if exists)

Live smoke log — $RUN_DIR/proofs/*live-smoke* (if exists)

The generated CLI — $CLI_DIR/ (if available)

Also gather the scorecard, verify pass rate, and dogfood report (from the shipcheck

proof or by re-running the tools if IN_REPO is true and the binary is available).

Phase 2: Mine the session

Scan the conversation history for six categories of signal and produce a candidate

list. The candidate list is not the finding list — Phase 2.5 triage will cull it

and Phase 3 will further drop weak survivors. Most candidates will not survive.

While collecting, distinguish:

Iteration noise — one-off retries, typos, normal trial-and-error during a

long generation. Skip these even at the candidate stage; they don't survive triage.

Per-CLI quirks — behavior tied to this API's shape (auth oddity, undocumented

endpoint, vendor-specific envelope) that wouldn't recur on another spec. Add to

the candidate list with a "looks per-CLI" tag — most will be dropped at triage.

Systemic friction — patterns that would plausibly recur on the next CLI

(template gap, default that needs to change, skill instruction that misled you).

These are what the retro exists to surface.

If running in a fresh conversation without generation history: Note this and

proceed with manuscript evidence only. Focus on what the manuscripts reveal — scorecard

gaps, verify failures, dogfood issues, and obvious template patterns in the CLI source.

Mark session-dependent findings as "evidence: manuscripts only."

2a. Errors and retries

Any time a command failed and was re-run, a build broke, or the Printing Press produced

code that didn't compile. What broke and what fixed it?

2b. Manual code edits

Manual edits during iteration are normal — agents reason over the generated CLI

and tweak. A single edit to handle this CLI's quirk is the workflow.

For each manual edit, ask: could the machine have raised the floor here?

Could the machine have completely prevented this edit? Default was wrong for

most APIs, template emitted broken code, parser missed a common pattern. If

yes AND the same edit would be needed on multiple CLIs you can name with

evidence → candidate.

Could the machine have given a better starting point that made the edit

smaller, simpler, or skippable in common cases? Even if you'd still tweak,

raising the floor compounds across future CLIs. If yes AND generalizable →

candidate.

Was this just per-API customization the agent was expected to do? Drop.

Was this iteration noise (typo, retry, transient confusion)? Drop.

The triage question is whether the machine raising the floor would compound

across future CLIs — not whether this one CLI would have shipped a few lines

lighter.

2c. Features built from scratch

Hand-built features (transcendence commands, novel commands, helper packages for

secondary APIs) are part of the workflow — agents build the domain-specific

value layer on top of the API surface the machine emits. Building features by

hand is not by itself a finding.

For each hand-built feature, ask: **could the machine have raised the floor for

this kind of feature?**

Could the machine have emitted a working default version, even if you'd still

customize it? (E.g., every list+detail API benefits from a summary

aggregation that the machine could scaffold from the spec.) Candidate, if

generalizable across multiple named APIs.

Could the machine have emitted scaffolding, types, or helpers that would have

cut the build effort meaningfully? (E.g., a typed secondary-client template

for combo CLIs, a fanout-aggregation helper.) Candidate, if generalizable.

Is this genuinely custom domain logic the machine couldn't realistically

generate from a spec? (E.g., booking a slot is custom orchestration; the

machine can emit the underlying endpoints but not the choreography.) Drop —

the SKILL is the right place to share the recipe, not the generator.

The "raises the floor" test separates "machine fix" from "SKILL recipe": if the

machine's contribution would still leave significant per-CLI work, the recipe

belongs in the SKILL so the next agent knows the pattern; if the machine could

absorb the boilerplate cleanly, it's a generator template.

2d. Recurring friction

Work that happens on every generation, not just this one. For each: **is this

inherent to the approach, or can the Printing Press eliminate it?**

Propose at least two possible fixes at different levels (generator templates, binary

post-processing, skill instruction) and assess which is most durable.

2e. Discovered optimizations

Improvements noticed during the session — UX ideas, performance improvements, new

command patterns, output format improvements. Could this optimization be detected

automatically and applied by the Printing Press?

2f. Scorer accuracy audit

Before proposing Printing Press fixes to improve scores, check whether the scoring

itself is correct. **Changing the Printing Press to satisfy a broken scorer is worse

than doing nothing.**

For each score penalty from dogfood, verify, and scorecard:

Trace the scorer's logic. Read the scoring tool's source code to understand

exactly what it checks. Don't guess.

Test the scorer's assumption against reality. Does the CLI actually have the

problem the scorer claims?

Classify the penalty:

Scorer is correct — the CLI genuinely has this problem.

Scorer is wrong — the CLI is fine; the scoring tool has a bug.

Scorer is partially right — both could be better.

Common scorer bugs: name derivation mismatches, grep-based detection missing patterns,

file exclusions too broad, section-counting heuristics.

The scorer audit is not optional. Every finding from a score penalty must have a

"Scorer correct?" assessment before proposing a fix direction.

2g. Combo CLI priority audit

Only runs when the briefing named 2+ sources. Check $RUN_DIR/source-priority.json

(from the Multi-Source Priority Gate in the main skill). If it doesn't exist but the

briefing or user command clearly listed multiple services, that's itself a finding:

the priority gate didn't fire when it should have.

For runs with a source-priority.json, cross-reference it against the absorb manifest

and the shipped CLI:

Command count per source. Count commands attributed to each named source in the

manifest. The primary should have at least as many as any secondary. If it has

fewer, that's a priority inversion and becomes a finding — even if the user

approved the manifest, it means the skill's discovery path for the primary failed

silently.

Auth scoping. If the primary was declared free in the priority gate but the

shipped CLI requires a paid key for the primary's headline commands, that's a

finding — the economics check either didn't run or didn't route the paid key

correctly to secondary-only scope.

README leadership. The primary should lead the README and --help. If a

secondary is the first thing the user sees, flag it.

Each of these is a skill instruction gap category finding. The durable fix lives

in skills/printing-press/SKILL.md (the Multi-Source Priority Gate, the Priority

inversion check before Phase Gate 1.5, and the brief's ## Source Priority section)

or in the generator if README ordering is template-driven.

Phase 2.5: Triage candidates

Before Phase 3 spends deep analysis on each candidate, run a fast triage to drop

candidates that don't justify the deeper look. Most candidates should die here.

The retro is a filter, not a funnel — if everything from Phase 2 makes it to Phase

3 unchanged, triage isn't doing its job.

For each candidate, ask in order:

Was this iteration noise? Normal trial-and-error during generation —

one-off retry, typo recovery, agent forgetting a flag, transient network blip. Drop.

Is this a printed-CLI fix? The fix lives in ~/printing-press/library/<api>/

and helps only this one CLI. If the proposed change is "edit this command in

this CLI" or "regenerate after fixing the spec," it's not a retro finding — it's

a polish pass on that CLI. Drop.

Is this an upstream API quirk? The vendor returns null instead of 404, or

ignores a query param the docs claim to honor, or has rate limits the spec

doesn't declare. The Printing Press doesn't fix vendors. If the only fix is

"work around this in the generator for every CLI," that's almost always wrong;

if it's "let one CLI work around it," that's a printed-CLI fix. Drop.

Is the only evidence "I noticed this once"? A one-time observation that you

can't connect to a recurring pattern across other CLIs is a candidate for Drop,

not a P3. P3 means "low priority systemic finding," not "I want to record this

somewhere."

Does the same finding appear in 2+ prior retros without being implemented?

Don't re-raise at the same priority. Either drop it (the cost-benefit math has

been "no" twice and the retro is becoming a wishlist), or reframe as a smaller

incremental fix that addresses part of the friction. Search:

grep -l "<finding keywords>" ~/printing-press/manuscripts/*/proofs/*-retro-*.md

Survivors of these five questions go to Phase 3. Dropped candidates are recorded

as one-line entries in the retro's "Dropped at triage" section — they exist for

your own discipline check and for the maintainer to see triage actually ran.

Anti-pattern to avoid. A recent Pagliacci retro produced "Skip: None. Every

finding warrants action." That sentence is the failure mode this triage exists

to prevent. Two of those findings (snake_case in Use:, root.go Short: rewrite

that the SKILL already documents as a manual step) were classic per-CLI / instructional

candidates that should have been dropped here. If you find yourself writing

"every finding warrants action," stop and re-run triage.

Phase 3: Classify findings

For each candidate that survived Phase 2.5 triage, answer these seven questions.

Question 5 has seven sub-steps (A through G); Step G is the adversarial check.

Findings that fail Step G drop out — they don't get a priority, they don't go in

the Do/Skip tables, they go on the dropped-candidates list with the reason.

1. What happened? One sentence — the symptom, not the fix.

2. Is the scorer correct? (mandatory for score-penalty findings)

Scorer correct → fix the Printing Press (templates, binary, or skill)

Scorer wrong → fix the scoring tool, not the Printing Press

Both → fix both, label which is primary

3. What category?

Phase 4: Prioritize

Sort survivors of Phase 3 into three buckets:

Do — survived Phase 3 Step G with a clear case-for. Assign a priority (P1,

P2, P3) based on frequency, fallback reliability, and complexity. Scorer bugs

are just findings like any other — rank them by impact alongside template gaps

and parser issues.

Skip — survived Phase 2.5 triage but didn't clear Phase 3 (Step B couldn't

name 3 APIs with evidence, Step D recurrence-cost disqualified, or Step G's

case-against was stronger). State the specific step that failed. These are

listed in the retro so the maintainer can see what was considered and rejected.

Drop — rejected at Phase 2.5 triage as iteration noise, printed-CLI fix,

upstream API quirk, unproven one-off, or recurring-not-implemented. Listed as

one-liners only — they don't need full analysis, they need a record so triage

is auditable.

No numerical scoring formulas. State the priority reasoning in words.

Sanity check before moving to Phase 5. Look at the bucket distribution.

Almost every retro should have some drops and some skips. A retro with

"all Do, no Skip, no Drop" is the failure mode — re-run triage and Step G on

the weakest findings. Likewise, if every Do is P1, you're not prioritizing,

you're inflating; force yourself to identify the weakest "Do" and ask whether

it really beats the Skip bar.

Phase 5: Write the retro

The retro document is the durable audit trail — keep all fields below. The

GitHub issue body in Phase 6 will use a slim subset (action-shaped fields

only); the full triage rationale lives here, in the doc that gets uploaded as

an artifact and linked from the issue. See

references/issue-template.md for the issue-body shape.

Write the full retro document using this template:

# Printing Press Retro: <API name>

## Session Stats

- API: <name>

- Spec source: <catalog/browser-sniffed/docs/HAR>

- Scorecard: <score>/100 (<grade>)

- Verify pass rate: <X>%

- Fix loops: <N>

- Manual code edits: <N>

- Features built from scratch: <N>

## Findings

### 1. <Title> (<category>)

- **What happened:** ...

- **Scorer correct?** Yes / No / Partially. [details]

- **Root cause:** Component + what's specifically wrong

- **Cross-API check:** Would this recur?

- **Frequency:** every API / most / subclass:<name> / this API only

- **Fallback if the Printing Press doesn't fix it:** ...

- **Worth a Printing Press fix?** ...

- **Inherent or fixable:** ...

- **Durable fix:** ...

- **Test:** How to verify (positive + negative)

- **Evidence:** Session moment that surfaced this

- **Related prior retros:** *(from Phase 3 Step D; "None" if no matches)*

  - `<api-slug>` retro #<issue-num-if-known> — `aligned` / `contradicts` / `extends`. <one-sentence note on what changed or what's shared>

  - ...

## Prioritized Improvements

### P1 — High priority

| Finding | Title | Component | Frequency | Fallback Reliability | Complexity | Guards |

|---------|-------|-----------|-----------|---------------------|------------|--------|

### P2 — Medium priority

| Finding | Title | Component | Frequency | Fallback Reliability | Complexity | Guards |

|---------|-------|-----------|-----------|---------------------|------------|--------|

### P3 — Low priority

| Finding | Title | Component | Frequency | Fallback Reliability | Complexity | Guards |

|---------|-------|-----------|-----------|---------------------|------------|--------|

*Omit empty priority sections.*

### Skip

| Finding | Title | Why it didn't make it (Step B / Step D / Step G) |

|---------|-------|--------------------------------------------------|

*Findings that survived Phase 2.5 triage but failed Phase 3 — name the specific

step that failed (e.g., "Step B: only 2 APIs with evidence" / "Step G: case-against

stronger; mostly per-CLI"). Empty if every Phase 3 candidate filed.*

### Dropped at triage

| Candidate | One-liner | Drop reason |

|-----------|-----------|-------------|

*Candidates rejected at Phase 2.5. One line each. Reasons: `iteration-noise` /

`printed-CLI` / `API-quirk` / `unproven-one-off` / `raised-N-times`. If this

section is empty, re-check Phase 2.5 — almost every retro has some.*

## Work Units

(see Phase 5.5)

## Anti-patterns

- ...

## What the Printing Press Got Right

- ...

Save the retro to manuscript proofs (always) and to the temp retro scratch

directory (always). Do not save retro documents under the source repo's

docs/retros/ directory; the skill must work the same way for users who do not

have the repo checked out, and retro documents are issue artifacts rather than

durable repo docs.

RETRO_STAMP="$(date +%Y%m%d-%H%M%S)"

RETRO_PROOF_PATH="$PRESS_MANUSCRIPTS/$API_NAME/$RUN_ID/proofs/$RETRO_STAMP-retro-$CLI_NAME.md"

RETRO_SCRATCH_DIR="/tmp/printing-press/retro"

RETRO_SCRATCH_PATH="$RETRO_SCRATCH_DIR/$RETRO_STAMP-$API_NAME-retro.md"

mkdir -p "$(dirname "$RETRO_PROOF_PATH")" "$RETRO_SCRATCH_DIR"

Write the full retro document to $RETRO_PROOF_PATH, then copy that file to

$RETRO_SCRATCH_PATH. This must complete before Phase 6 Step 1 copies the

manuscripts directory to staging.

Scrub the retro doc immediately after writing

The retro doc is preserved in manuscripts/<api>/<run>/proofs/ (durable),

copied to /tmp/printing-press/retro/ (scratch), and read by future runs'

Phase 3 Step D dedup scan. If a finding's "What we observed" block pasted

unredacted scanner output, dogfood payloads, or Greptile review comments, the

secret/PII propagates into all three locations. Run the Layer 0 body scrub

from references/secret-scrubbing.md immediately after writing the doc, so

the scrubbed version becomes canonical:

# Define scrub_body once at the top of the Phase 5/6 bash blocks (full source

# in references/secret-scrubbing.md Layer 0). Then:

RETRO_PROOF_PATH_SCRUBBED="${RETRO_PROOF_PATH}.scrubbed.md"

if ! scrub_body "$RETRO_PROOF_PATH" "$RETRO_PROOF_PATH_SCRUBBED"; then

  echo "" >&#x26;2

  echo "ERROR: retro doc contains an unredacted vendor-prefix secret." >&#x26;2

  echo "Open $RETRO_PROOF_PATH, redact each match reported above using" >&#x26;2

  echo "  <REDACTED:<vendor>-<kind>:<first4>...<last4>:<len>ch>" >&#x26;2

  echo "per references/secret-scrubbing.md Layer 0, then re-run /printing-press-retro." >&#x26;2

  exit 1

fi

mv "$RETRO_PROOF_PATH_SCRUBBED" "$RETRO_PROOF_PATH"

cp "$RETRO_PROOF_PATH" "$RETRO_SCRATCH_PATH"

Hard-fail behavior is intentional: vendor-prefix secrets are unrecoverable

leaks once a retro doc gets archived or uploaded. The agent must hand-redact

and re-run rather than silently shipping the leak. PII patterns (real emails,

phones, account inboxes) auto-redact in place because the substitution is

lossless for the retro's purpose.

Phase 5.5: Plannable work units

Group related findings into coherent work units a planner could pick up directly.

For each "Do" finding or group of related findings:

### WU-1: <Title> (from F1, F3, ...)

- **Priority:** P1 / P2 / P3 *(max priority among absorbed findings — P1 if any

  absorbed finding is P1, else P2 if any is P2, else P3)*

- **Component:** generator / openapi-parser / spec-parser / scorer / skill / catalog

  *(must match one of the six fixed component slugs; drives the `comp:*` label

  applied to the issue when filed)*

- **Goal:** One sentence describing the outcome

- **Target:** <component and area, e.g., "Generator templates in internal/generator/">

- **Acceptance criteria:**

  - positive test: ...

  - negative test: ...

- **Scope boundary:** What this does NOT include

- **Dependencies:** Other work units that must complete first

- **Complexity:** small / medium / large

The six fixed component slugs are: generator (internal/generator/),

openapi-parser (internal/openapi/), spec-parser (internal/spec/),

scorer (verify / dogfood / scorecard), skill (skills/printing-press/SKILL.md),

catalog (catalog/). If a WU genuinely spans two, pick the primary one — the

component where the durable fix will land. Pick exactly one; don't multi-label.

**If running from inside the printing-press repo (IN_REPO=true):**

Resolve target file paths using Glob and Grep tool invocations on $REPO_ROOT to

make work units more precise. E.g., use Glob to find internal/generator/*.go files,

Grep to find where sync code is generated.

**If running externally (IN_REPO=false):**

Describe target components by name (e.g., "Generator templates in internal/generator/")

and acceptance criteria without resolved file paths. The fixer will resolve paths when

they pick up the work.

Phase 5.6: Issue gate — are there Printing Press improvements?

After prioritization and work units are written, decide whether GitHub issues are

warranted. Each WU becomes one flat top-level issue (no parent, no sub-issue

hierarchy). The purpose of filing is to give someone (human or agent) something

to fix in the Printing Press. If every finding is specific to this one printed

CLI with nothing to change in the Printing Press, filing is noise — there's

nothing to act on.

Skip filing if:

Every finding landed in "Skip"

All findings are printed-CLI-specific (manual edits that only apply to this one API

and wouldn't recur across other CLIs)

The "Do" table is empty

File issues (one per WU) if:

There is at least one "Do" finding — i.e., something a maintainer or agent could

act on in the Printing Press (templates, binary, skills, or scoring tools)

Use judgment. A retro that found three things but all three are "this API has a weird

auth scheme no other API uses" is not worth filing. A retro that found one small

template gap that would help every future CLI is worth filing.

If filing is skipped, still save the retro locally (manuscript proofs +

/tmp/printing-press/retro/), present the findings to the user, then jump

directly to Phase 6 Step 6 (present results — adjusted to show local-only paths).

Phase 6: Package, upload, and present

Step 1: Package artifacts into staging folder

Read and apply references/artifact-packaging.md

through Step 4 only (create staging dir, copy, scrub, zip). Do not upload or

clean up yet — the staging folder stays alive until the end of Phase 6.

The staging folder ($STAGING_DIR) now contains the scrubbed copies and the zips.

This is both the review target and the upload source.

Step 2: Compute filing plan + confirm before publishing

This step only runs if the Phase 5.6 issue gate passed (there are Printing Press findings to act on).

Before showing the confirm prompt, run references/issue-template.md

Steps 1, 2, and 2.5 to ensure labels exist, sort the work units, and

compute the per-WU filing plan via the dedup scan against open

retro-tagged issues. Each WU ends up classified as either:

File new — no matching open issue

Comment on #N — Step 2.5 found a same match; the new evidence will be added as a comment instead of filing a duplicate

File new with related issues — Step 2.5 found one or more related-area matches; the new issue's body will reference them via #N in the Related issues block

The dedup scan does not need to be bulletproof. Bias toward "file new"

when uncertain — duplicates are recoverable, miscomments on the wrong

issue are uglier.

Then show the user a summary including the filing plan and ask for

confirmation via AskUserQuestion.

Ready to submit your retro.

Here's what will happen on mvanhorn/cli-printing-press:

Filing plan:

Title

Plan

Notes

File new (P1, comp:)

No match

Comment on #234

Matches ""

File new + reference #189

Adjacent open issue

Each new issue carries retro, priority:P<n>, comp:<slug> labels —

agents filter related work across retros with gh issue list --label comp:<slug> or gh issue list --label priority:P1.

Scrubbed artifact zips uploaded to catbox.moe and linked from each new issue:

Retro document — full triage rationale, drops, skips, what went right

Manuscripts () — research brief, shipcheck proof, build logs

CLI source () — the generated Go code (no binary, no vendor/) (omit if not available)

Everything is staged at <$STAGING_DIR> if you'd like to inspect the files first.

Options:

Submit — execute the filing plan

Let me review the files first — I'll check the staging folder, then come back

Save locally only — skip filing, keep the manuscript proof and temp copy

If the user picks "Let me review the files first," acknowledge and wait. When they

come back, re-ask with Submit / Save locally only.

If the user picks "Save locally only," skip Steps 3 and 4 — the retro is already

saved to manuscript proofs and /tmp/printing-press/retro/. Clean up the staging

folder, then jump to Step 6.

If the user wants to override a dedup decision before submitting (e.g.,

"file new for WU-2 instead of commenting"), accept the override: clear

WU_DEDUP[i] for that WU and proceed.

Step 3: Upload artifacts

Run artifact-packaging.md Step 5 (the catbox upload) using the zips already in

$STAGING_DIR. This produces $MANUSCRIPTS_URL and $CLI_SOURCE_URL.

Step 4: Execute the filing plan

Steps 1, 2, and 2.5 of references/issue-template.md

already ran during Step 2 (filing plan + confirm), so labels exist, WUs are

sorted, and $WU_DEDUP and $WU_RELATED are populated. This step runs

Step 3 of the reference: build bodies and execute the plan in parallel.

The "Execution principles" block at the top of issue-template.md is

mandatory: build issue bodies inline (heredocs into shell variables, not

the Write tool), run the whole step in one Bash invocation, and parallelize

the per-WU gh issue create / gh issue comment calls. Skipping these

costs real wall-clock latency — an N WU retro should finish in a single

round trip's worth of network time, not a serialized stack of them.

Each WU is independent: WUs marked comment:#N get a comment on the

existing issue; WUs marked file-new create a new flat top-level issue. No

parent, no sub-issue REST linking — every new issue stands alone in

GitHub's issue list with its own open/close lifecycle.

Each new issue carries its own priority:P<n> and comp:<slug> labels.

This is what enables gh issue list --label comp:openapi-parser to surface

every retro WU in that area across every retro — labels are the cross-retro

discovery surface, not auto-cross-links inside issue bodies.

Each new issue body's Related issues block combines:

Prior-retro references from Phase 3 Step D (alignments, contradictions, extensions across retros)

related-area issue references from Step 2.5 (open issues in adjacent territory)

Both reach across separate filed work where the #N auto-cross-link is

real signal. The body does not auto-cross-link to sibling WUs in the

same retro; that linkage is noise unless one is genuinely a prerequisite

(captured as free-text Dependencies: instead).

If gh is not authenticated or every per-WU action fails, follow the

graceful degradation path in the issue-template reference: save locally and

print manual filing instructions. Per-WU partial failures (some succeed,

some don't) are surfaced through $FAILED_ISSUES in Step 6.

Step 5: Local scratch copy

Ensure the temp scratch copy exists. This is the human-friendly local path for

reviewing or manually filing the retro when upload or issue creation fails.

if [ -f "$RETRO_PROOF_PATH" ]; then

  mkdir -p "$RETRO_SCRATCH_DIR"

  cp "$RETRO_PROOF_PATH" "$RETRO_SCRATCH_PATH"

fi

Step 6: Present results

After issues are created and comments posted, show the user a summary in

priority order. Group created and commented outcomes — both are real

filed work, but the shape differs.

Retro submitted!

Filed new issue, added comment on existing issues (P1 → P3 order):

New issues:

[P1] — <full $OUTCOME_URL[i]>

[P2] — <full $OUTCOME_URL[i]>

Comments on existing issues:

[P1] → comment on #234 —

findings across work units. New issues are tagged with comp:<slug>

and priority:P<n> labels — agents can filter related work across retros

with gh issue list --label comp:<slug> or gh issue list --label priority:P1.

(if artifacts uploaded) Artifacts: retro doc · manuscripts · CLI source

Local copy: <$RETRO_SCRATCH_PATH>

The [P<n>] annotation here is presentation-only — the issue titles

themselves do not carry a priority prefix (priority lives on the label).

Showing it in the user-facing summary helps the user scan filed work in

priority order without opening each issue.

Omit either subsection (New issues: or Comments on existing issues:)

when empty. A retro that produced only comments (every WU matched an

existing open issue) is a good outcome — it means the issue tracker

already covered the findings and the new evidence reinforces them.

If $FAILED_ISSUES is non-empty (set by references/issue-template.md

Step 3), append a warning block before the closing line:

⚠️ Some actions need attention:

File the missing issue(s) or comment(s) manually using the retro doc at <$RETRO_SCRATCH_PATH>.

If filing wasn't completed (user chose local-only, or gh failed entirely),

show the local save paths and the manual filing instructions printed by

the issue-template fallback path.

Step 7: Clean up staging folder

Run artifact-packaging.md Step 7 to delete $STAGING_DIR.

Rules

Prefer automatic fixes (templates, binary) over instructional fixes (skill).

For recurring friction, always answer "inherent or fixable?" honestly.

Be honest about what went well. Protecting good patterns matters.

Default is don't-file. Bias toward filing only when Phase 3 Step B gave you

three concrete cross-API examples with evidence (not speculation), and the

Step G case-against was clearly weaker than the case-for. "20% of catalog"

without named APIs is optimism. "Every API has multi-word resources" is

hand-waving. The retro is a filter, not a wishlist; an issue overloaded

with weak findings wastes maintainer attention.

When in doubt, drop. A finding you're uncertain about almost certainly

shouldn't be filed. The next CLI's retro will surface it again with stronger

evidence if it's real; if it doesn't, it wasn't.

Look for broader patterns. When something does clear the bar, check

whether this is the first sighting of a behavior you'd encounter again.

When a fix applies to an API subclass, include the condition AND the guard.

No time estimates. Use complexity sizing (small/medium/large).

Be thorough on the findings that survive. Include enough detail that someone

reading months later can understand the finding, the reasoning, and the proposed

fix without the original conversation.

Do not add more phases, documents, or gates to the main printing-press skill.

Propose making existing phases smarter or the Printing Press emit better defaults.

Never quote a leaked secret as "evidence" of a secret-leak finding. The

finding's whole point is that the value should not be public; quoting it in

a public GitHub issue re-leaks it. Use the redacted form from

references/secret-scrubbing.md Layer 0

(<REDACTED:<vendor>-<kind>:<first4>...<last4>:<len>ch>) — the maintainer

can fix the scanner without seeing the value. Phase 5 (retro doc write) and

Phase 6 Step 3 (pre-post scrub) hard-fail when an unredacted vendor-prefix

token is detected; that's the floor, not a substitute for redacting at

write time.

printing-press-retro