SKILL.md
Paper Audit Skill v4.5
paper-audit is deep-review-first. Its core job is to behave like a
serious reviewer: find technical, methodological, claim-level, and
cross-section issues; keep script-backed findings separate from reviewer
judgment; and return a structured issue bundle plus a revision roadmap.
Version 4.5 adds a script-backed PRESUBMISSION layer for final-week
mechanical checks (em dashes, AI-tone term frequency, abstract completeness,
LaTeX citation/label/equation hygiene, paragraph-shape weak signals, concrete
captions). It plugs into existing modes; it is not a separate public mode.
See references/PRESUBMISSION_GUIDE.md for mode integration.
Use it for audit and review. Do not use it as the first tool for source
editing, sentence rewriting, or build fixing.
What This Skill Produces
quick-audit: fast submission-readiness screen with script-backed findings,
including PRESUBMISSION
deep-review: reviewer-style structured issue bundle with major/moderate/
minor findings
gate: PASS/FAIL decision calibrated for submission blockers;
PRESUBMISSION Major/Minor findings remain advisory
re-audit: compare current issue bundle against a previous audit, including
mechanical regression findings
polish: precheck-only handoff into a polishing workflow
The primary product is no longer just a score. For deep-review, the main
outputs are:
final_issues.json
overall_assessment.txt
review_report.md
peer_review_report.md
revision_roadmap.md
Do Not Use
- direct source surgery on
.tex/.typ
- compilation debugging as the main task
- free-form literature survey writing
- paragraph-level related-work rewriting
- cosmetic grammar cleanup without an audit goal
Critical Rules
- Don't rewrite the paper source —
paper-auditis a reviewer, not an editor; switch skills explicitly if the user wants prose changes, so review evidence stays separable from edits.
- Don't fabricate references, baselines, or reviewer evidence — invented citations and made-up reviewer voices undermine every other finding in the bundle.
- Distinguish
[Script]from[LLM]findings — script-backed items have a deterministic anchor the user can rerun, while LLM findings need a quote or section to be falsifiable.
- Anchor every reviewer finding to a quote, section, or exact textual location — unanchored complaints become impossible to audit on a re-pass.
- Be conservative with OCR noise, formatting quirks, and copy-editing trivia — flagging cosmetic noise inflates the report and buries the real issues.
- Read like a careful reader before flagging — understand the author's intended meaning first so the issue captures a real misread, not a strawman.
- For literature findings, judge whether the gap is evidence-backed and fairly positioned, and don't rewrite the prose inside
paper-audit— keep prose rewrites in the format-specific writing skills where they can be reviewed in isolation.
- For
PRESUBMISSION, map CRITICAL / MAJOR / MINOR to Critical / Major / Minor script severities; only Critical or failed checklist items can failgate— otherwise mechanical findings drown out the substantive ones.
Full mode-integration matrix lives in references/PRESUBMISSION_GUIDE.md.
- In PDF mode, do not guess source-only hygiene. Report text-proven items
and note that LaTeX/Typst source checks were skipped.
Mode Selection
Requested intent
Mode
"check my paper", "quick audit", "submission readiness", "pre-submission review", "投稿前检查"
quick-audit
"review my paper", "simulate peer review", "harsh review", "deep review"
deep-review
"is this ready to submit", "gate this submission", "blockers only"
gate
"did I fix these issues", "re-audit", "compare against old review"
re-audit
"polish the writing, but only if safe"
polish
Legacy aliases still work for one compatibility cycle:
self-check->quick-audit
review->deep-review
For per-mode workflow steps, input resolution rules, presentation surface
rules, and committee focus routing, see references/MODE_GUIDE.md.
Review Standard
Read these references before running reviewer-style work:
references/REVIEW_CRITERIA.md
references/DEEP_REVIEW_CRITERIA.md
references/CHECKLIST.md
references/CONSOLIDATION_RULES.md
references/ISSUE_SCHEMA.md
references/PRE_SUBMISSION_RULES.md
references/PRESUBMISSION_GUIDE.md
references/MODE_GUIDE.md
The deep-review workflow uses a 16-part issue taxonomy:
- formula / derivation errors
- notation inconsistency
- prose vs formal object mismatch
- numerical inconsistency
- missing justification
- overclaim or claim inaccuracy
- ambiguity that can mislead a careful reader
- underspecified methods / missing information
- internal contradiction
- self-consistency of standards
- table structure violations
- abstract structural incompleteness
- theory contribution deficiency
- qualitative methodology opacity
- pseudo-innovation / straw man
- paragraph-level argument incoherence
Workflow
Each mode has the same shape: parse $ARGUMENTS, lock the paper path, infer
mode/report-style/focus/language if not provided, then run the canonical
command. Detailed phase steps are in references/MODE_GUIDE.md.
quick-audit
uv run python -B "$SKILL_DIR/scripts/audit.py" <paper> --mode quick-audit ...
Present Submission Blockers -> Quality Improvements -> checklist; call
out PRESUBMISSION mechanical findings with [Script] provenance. Escalate
to deep-review when the user wants reviewer-depth critique.
deep-review
Five phases (see references/MODE_GUIDE.md for full detail):
- Workspace prep:
uv run python -B "$SKILL_DIR/scripts/prepare_review_workspace.py" <paper> --output-dir ./review_results
- Phase 0 automated audit:
uv run python -B "$SKILL_DIR/scripts/audit.py" <paper> --mode deep-review ...
- Phase 3A committee — dispatch 5 committee agents (editor, theory,
literature, methodology, logic) and write committee/consensus.md.
- Phase 3B section + cross-cutting lanes — section, claims-vs-evidence,
notation, evaluation fairness, self-consistency, prior-art, and
pre-submission readiness (full/editor focus only).
- Consolidation:
uv run python -B "$SKILL_DIR/scripts/consolidate_review_findings.py" <review_dir>
uv run python -B "$SKILL_DIR/scripts/verify_quotes.py" <review_dir> --write-back
uv run python -B "$SKILL_DIR/scripts/render_deep_review_report.py" <review_dir>
When the user explicitly asks for journal-review prose, set
--report-style peer-review so peer_review_report.md becomes the **Primary
View** while review_report.md stays as the richer evidence bundle.
gate
uv run python -B "$SKILL_DIR/scripts/audit.py" <paper> --mode gate ...
Run EIC Screening (Phase 0.5) using agents/editor_in_chief_agent.md
first; report PASS/FAIL; verdict -> EIC -> blockers -> advisory. A desk-reject
verdict is a gate blocker. Critical PRESUBMISSION only blocks the gate.
re-audit
Requires --previous-report PATH.
uv run python -B "$SKILL_DIR/scripts/audit.py" <paper> --mode re-audit --previous-report <path> ...
uv run python -B "$SKILL_DIR/scripts/diff_review_issues.py" <old_final_issues.json> <new_final_issues.json>
Present root-cause-aware status labels: FULLY_ADDRESSED,
PARTIALLY_ADDRESSED, NOT_ADDRESSED, NEW.
polish
uv run python -B "$SKILL_DIR/scripts/audit.py" <paper> --mode polish ...
If blockers exist, stop and report them. Only proceed into polishing if the
precheck is safe.
Output Contract
For deep-review, the final issue schema is:
{
"title": "short issue title",
"quote": "exact quote from paper",
"explanation": "why this matters and what remains problematic",
"comment_type": "methodology|claim_accuracy|presentation|missing_information",
"severity": "major|moderate|minor",
"confidence": "high|medium|low|unverified",
"source_kind": "script|llm",
"source_section": "methods",
"related_sections": ["results", "appendix"],
"root_cause_key": "shared-normalized-key",
"review_lane": "claims_vs_evidence",
"gate_blocker": false,
"quote_verified": true
}
Always prefer:
- exact quotes over vague paraphrase
- evidence-backed findings over style commentary
- issue bundle + roadmap over raw script dumps
References
File
Purpose
references/MODE_GUIDE.md
per-mode workflow detail, phase steps, committee focus routing
references/PRESUBMISSION_GUIDE.md
PRESUBMISSION mode-integration behavior matrix
references/REVIEW_CRITERIA.md
top-level audit scoring and mapping
references/DEEP_REVIEW_CRITERIA.md
deep-review-specific issue taxonomy and leniency rules
references/CONSOLIDATION_RULES.md
deduplication and root-cause merge policy
references/ISSUE_SCHEMA.md
canonical JSON schema
references/REVIEW_LANE_GUIDE.md
section lanes and cross-cutting lanes
references/PRE_SUBMISSION_RULES.md
final-week mechanical audit rules and term list
references/SUBAGENT_TEMPLATES.md
reviewer task templates
references/QUICK_REFERENCE.md
CLI and mode cheat sheet
Scripts
Script
Purpose
scripts/audit.py
Phase 0 audit and mode entrypoint
scripts/pre_submission_check.py
deterministic PRESUBMISSION mechanical audit layer
scripts/prepare_review_workspace.py
create deep-review workspace
scripts/build_claim_map.py
extract headline claims and closure targets
scripts/consolidate_review_findings.py
deduplicate comment JSONs
scripts/verify_quotes.py
verify exact quote presence
scripts/render_deep_review_report.py
render final Markdown report
scripts/diff_review_issues.py
compare old vs new issue bundles
Reviewer Lanes
Committee agents (deep-review default):
committee_editor_agent.md
committee_theory_agent.md
committee_literature_agent.md
committee_methodology_agent.md
committee_logic_agent.md
Default deep-review lanes live in agents/:
section_reviewer_agent.md
claims_evidence_reviewer_agent.md
notation_consistency_reviewer_agent.md
evaluation_fairness_reviewer_agent.md
self_consistency_reviewer_agent.md
prior_art_reviewer_agent.md
synthesis_agent.md
editor_in_chief_agent.md— EIC desk-reject screener (used ingatemode)
Specialized deep-review agents (read their files for activation criteria):
critical_reviewer_agent.md— devil's advocate with C3-C5 checks
domain_reviewer_agent.md— domain expertise with A1-A7 assessments
methodology_reviewer_agent.md— methodology rigor with B3-B10 checks
literature_reviewer_agent.md— evidence-based literature verification
(optional, --literature-search)
Examples
- "Review this manuscript like a serious conference reviewer and tell me the
biggest validity risks."
- "Run a quick audit on
paper.texand tell me what blocks submission."
- "Gate this IEEE submission and separate blockers from recommendations."
- "Re-audit this revision against my previous report."
- "Audit only the literature positioning and tell me whether the claimed gap
is real or fabricated by selective citation."