SKILL.md

Paper Audit Skill v4.5

paper-audit is deep-review-first. Its core job is to behave like a

serious reviewer: find technical, methodological, claim-level, and

cross-section issues; keep script-backed findings separate from reviewer

judgment; and return a structured issue bundle plus a revision roadmap.

Version 4.5 adds a script-backed PRESUBMISSION layer for final-week

mechanical checks (em dashes, AI-tone term frequency, abstract completeness,

LaTeX citation/label/equation hygiene, paragraph-shape weak signals, concrete

captions). It plugs into existing modes; it is not a separate public mode.

See references/PRESUBMISSION_GUIDE.md for mode integration.

Use it for audit and review. Do not use it as the first tool for source

editing, sentence rewriting, or build fixing.

What This Skill Produces

quick-audit: fast submission-readiness screen with script-backed findings,

including PRESUBMISSION

deep-review: reviewer-style structured issue bundle with major/moderate/

minor findings

gate: PASS/FAIL decision calibrated for submission blockers;

PRESUBMISSION Major/Minor findings remain advisory

re-audit: compare current issue bundle against a previous audit, including

mechanical regression findings

polish: precheck-only handoff into a polishing workflow

The primary product is no longer just a score. For deep-review, the main

outputs are:

final_issues.json

overall_assessment.txt

review_report.md

peer_review_report.md

revision_roadmap.md

Do Not Use

direct source surgery on .tex / .typ

compilation debugging as the main task

free-form literature survey writing

paragraph-level related-work rewriting

cosmetic grammar cleanup without an audit goal

Critical Rules

Don't rewrite the paper source — paper-audit is a reviewer, not an editor; switch skills explicitly if the user wants prose changes, so review evidence stays separable from edits.

Don't fabricate references, baselines, or reviewer evidence — invented citations and made-up reviewer voices undermine every other finding in the bundle.

Distinguish [Script] from [LLM] findings — script-backed items have a deterministic anchor the user can rerun, while LLM findings need a quote or section to be falsifiable.

Anchor every reviewer finding to a quote, section, or exact textual location — unanchored complaints become impossible to audit on a re-pass.

Be conservative with OCR noise, formatting quirks, and copy-editing trivia — flagging cosmetic noise inflates the report and buries the real issues.

Read like a careful reader before flagging — understand the author's intended meaning first so the issue captures a real misread, not a strawman.

For literature findings, judge whether the gap is evidence-backed and fairly positioned, and don't rewrite the prose inside paper-audit — keep prose rewrites in the format-specific writing skills where they can be reviewed in isolation.

For PRESUBMISSION, map CRITICAL / MAJOR / MINOR to Critical / Major / Minor script severities; only Critical or failed checklist items can fail gate — otherwise mechanical findings drown out the substantive ones.

Full mode-integration matrix lives in references/PRESUBMISSION_GUIDE.md.

In PDF mode, do not guess source-only hygiene. Report text-proven items

and note that LaTeX/Typst source checks were skipped.

Mode Selection

Requested intent

Mode

"check my paper", "quick audit", "submission readiness", "pre-submission review", "投稿前检查"

quick-audit

"review my paper", "simulate peer review", "harsh review", "deep review"

deep-review

"is this ready to submit", "gate this submission", "blockers only"

gate

"did I fix these issues", "re-audit", "compare against old review"

re-audit

"polish the writing, but only if safe"

polish

Legacy aliases still work for one compatibility cycle:

self-check -> quick-audit

review -> deep-review

For per-mode workflow steps, input resolution rules, presentation surface

rules, and committee focus routing, see references/MODE_GUIDE.md.

Review Standard

Read these references before running reviewer-style work:

references/REVIEW_CRITERIA.md

references/DEEP_REVIEW_CRITERIA.md

references/CHECKLIST.md

references/CONSOLIDATION_RULES.md

references/ISSUE_SCHEMA.md

references/PRE_SUBMISSION_RULES.md

references/PRESUBMISSION_GUIDE.md

references/MODE_GUIDE.md

The deep-review workflow uses a 16-part issue taxonomy:

formula / derivation errors

notation inconsistency

prose vs formal object mismatch

numerical inconsistency

missing justification

overclaim or claim inaccuracy

ambiguity that can mislead a careful reader

underspecified methods / missing information

internal contradiction

self-consistency of standards

table structure violations

abstract structural incompleteness

theory contribution deficiency

qualitative methodology opacity

pseudo-innovation / straw man

paragraph-level argument incoherence

Workflow

Each mode has the same shape: parse $ARGUMENTS, lock the paper path, infer

mode/report-style/focus/language if not provided, then run the canonical

command. Detailed phase steps are in references/MODE_GUIDE.md.

quick-audit

uv run python -B "$SKILL_DIR/scripts/audit.py" <paper> --mode quick-audit ...

Present Submission Blockers -> Quality Improvements -> checklist; call

out PRESUBMISSION mechanical findings with [Script] provenance. Escalate

to deep-review when the user wants reviewer-depth critique.

deep-review

Five phases (see references/MODE_GUIDE.md for full detail):

Workspace prep:

uv run python -B "$SKILL_DIR/scripts/prepare_review_workspace.py" <paper> --output-dir ./review_results

Phase 0 automated audit:

uv run python -B "$SKILL_DIR/scripts/audit.py" <paper> --mode deep-review ...

Phase 3A committee — dispatch 5 committee agents (editor, theory,

literature, methodology, logic) and write committee/consensus.md.

Phase 3B section + cross-cutting lanes — section, claims-vs-evidence,

notation, evaluation fairness, self-consistency, prior-art, and

pre-submission readiness (full/editor focus only).

Consolidation:

uv run python -B "$SKILL_DIR/scripts/consolidate_review_findings.py" <review_dir>

uv run python -B "$SKILL_DIR/scripts/verify_quotes.py" <review_dir> --write-back

uv run python -B "$SKILL_DIR/scripts/render_deep_review_report.py" <review_dir>

When the user explicitly asks for journal-review prose, set

--report-style peer-review so peer_review_report.md becomes the **Primary

View** while review_report.md stays as the richer evidence bundle.

gate

uv run python -B "$SKILL_DIR/scripts/audit.py" <paper> --mode gate ...

Run EIC Screening (Phase 0.5) using agents/editor_in_chief_agent.md

first; report PASS/FAIL; verdict -> EIC -> blockers -> advisory. A desk-reject

verdict is a gate blocker. Critical PRESUBMISSION only blocks the gate.

re-audit

Requires --previous-report PATH.

uv run python -B "$SKILL_DIR/scripts/audit.py" <paper> --mode re-audit --previous-report <path> ...

uv run python -B "$SKILL_DIR/scripts/diff_review_issues.py" <old_final_issues.json> <new_final_issues.json>

Present root-cause-aware status labels: FULLY_ADDRESSED,

PARTIALLY_ADDRESSED, NOT_ADDRESSED, NEW.

polish

uv run python -B "$SKILL_DIR/scripts/audit.py" <paper> --mode polish ...

If blockers exist, stop and report them. Only proceed into polishing if the

precheck is safe.

Output Contract

For deep-review, the final issue schema is:

{

  "title": "short issue title",

  "quote": "exact quote from paper",

  "explanation": "why this matters and what remains problematic",

  "comment_type": "methodology|claim_accuracy|presentation|missing_information",

  "severity": "major|moderate|minor",

  "confidence": "high|medium|low|unverified",

  "source_kind": "script|llm",

  "source_section": "methods",

  "related_sections": ["results", "appendix"],

  "root_cause_key": "shared-normalized-key",

  "review_lane": "claims_vs_evidence",

  "gate_blocker": false,

  "quote_verified": true

}

Always prefer:

exact quotes over vague paraphrase

evidence-backed findings over style commentary

issue bundle + roadmap over raw script dumps

References

File

Purpose

references/MODE_GUIDE.md

per-mode workflow detail, phase steps, committee focus routing

references/PRESUBMISSION_GUIDE.md

PRESUBMISSION mode-integration behavior matrix

references/REVIEW_CRITERIA.md

top-level audit scoring and mapping

references/DEEP_REVIEW_CRITERIA.md

deep-review-specific issue taxonomy and leniency rules

references/CONSOLIDATION_RULES.md

deduplication and root-cause merge policy

references/ISSUE_SCHEMA.md

canonical JSON schema

references/REVIEW_LANE_GUIDE.md

section lanes and cross-cutting lanes

references/PRE_SUBMISSION_RULES.md

final-week mechanical audit rules and term list

references/SUBAGENT_TEMPLATES.md

reviewer task templates

references/QUICK_REFERENCE.md

CLI and mode cheat sheet

Scripts

Script

Purpose

scripts/audit.py

Phase 0 audit and mode entrypoint

scripts/pre_submission_check.py

deterministic PRESUBMISSION mechanical audit layer

scripts/prepare_review_workspace.py

create deep-review workspace

scripts/build_claim_map.py

extract headline claims and closure targets

scripts/consolidate_review_findings.py

deduplicate comment JSONs

scripts/verify_quotes.py

verify exact quote presence

scripts/render_deep_review_report.py

render final Markdown report

scripts/diff_review_issues.py

compare old vs new issue bundles

Reviewer Lanes

Committee agents (deep-review default):

committee_editor_agent.md

committee_theory_agent.md

committee_literature_agent.md

committee_methodology_agent.md

committee_logic_agent.md

Default deep-review lanes live in agents/:

section_reviewer_agent.md

claims_evidence_reviewer_agent.md

notation_consistency_reviewer_agent.md

evaluation_fairness_reviewer_agent.md

self_consistency_reviewer_agent.md

prior_art_reviewer_agent.md

synthesis_agent.md

editor_in_chief_agent.md — EIC desk-reject screener (used in gate mode)

Specialized deep-review agents (read their files for activation criteria):

critical_reviewer_agent.md — devil's advocate with C3-C5 checks

domain_reviewer_agent.md — domain expertise with A1-A7 assessments

methodology_reviewer_agent.md — methodology rigor with B3-B10 checks

literature_reviewer_agent.md — evidence-based literature verification

(optional, --literature-search)

Examples

"Review this manuscript like a serious conference reviewer and tell me the

biggest validity risks."

"Run a quick audit on paper.tex and tell me what blocks submission."

"Gate this IEEE submission and separate blockers from recommendations."

"Re-audit this revision against my previous report."

"Audit only the literature positioning and tell me whether the claimed gap

is real or fabricated by selective citation."

paper-audit