Name: paper2code
Author: prathamlearnstocode

paper2code

Converts an arxiv paper into a minimal, citation-anchored Python implementation. Trigger when user runs /paper2code with an arxiv URL or paper ID, says…

INSTALLATION

npx skills add https://github.com/prathamlearnstocode/paper2code --skill paper2code

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

$27

Install dependencies

Run via Bash:

pip install pymupdf4llm pdfplumber requests pyyaml

Execute pipeline

Stage 1 — Paper Acquisition and Parsing

Read and follow: pipeline/01_paper_acquisition.md

Run the helper script to fetch and parse the paper:

python skills/paper2code/scripts/fetch_paper.py {ARXIV_ID} .paper2code_work/{ARXIV_ID}/

Then run structure extraction:

python skills/paper2code/scripts/extract_structure.py .paper2code_work/{ARXIV_ID}/paper_text.md .paper2code_work/{ARXIV_ID}/

Verify the outputs exist before proceeding. If extraction failed, follow the fallback protocol in pipeline/01_paper_acquisition.md.

The script also searches for official code repositories (in the paper text and on the arxiv page) and saves any found links to paper_metadata.json under the official_code key. Verify these links before relying on them — see Step 8 in pipeline/01_paper_acquisition.md.

Stage 2 — Contribution Identification

Read and follow: pipeline/02_contribution_identification.md

Read the parsed paper sections. Identify the single core contribution. Classify the paper type. Write the contribution statement. Save it to .paper2code_work/{ARXIV_ID}/contribution.md.

Stage 3 — Ambiguity Audit

Read and follow: pipeline/03_ambiguity_audit.md

Before reading this stage, also read: guardrails/hallucination_prevention.md

Go through every implementation-relevant detail. Classify each as SPECIFIED, PARTIALLY_SPECIFIED, or UNSPECIFIED. Save the audit to .paper2code_work/{ARXIV_ID}/ambiguity_audit.md.

Stage 4 — Code Generation

Read and follow: pipeline/04_code_generation.md

Before writing code, read:

guardrails/scope_enforcement.md — to determine what's in and out of scope

guardrails/badly_written_papers.md — if the paper is vague or inconsistent

The relevant knowledge files in knowledge/ for the paper's domain

The scaffold templates in scaffolds/ for the expected file structure

Determine the paper_slug from the paper title (lowercase, underscores, no special chars).

Generate all files under {paper_slug}/ in the current working directory.

Stage 5 — Walkthrough Notebook

Read and follow: pipeline/05_walkthrough_notebook.md

Generate the walkthrough notebook that connects paper sections to code with runnable sanity checks. Save to {paper_slug}/notebooks/walkthrough.ipynb.

Cleanup

Remove the .paper2code_work/ directory after successful completion.

Final output

Print a summary:

✓ paper2code complete for: {paper_title}

  Output directory: {paper_slug}/

  Files generated: {list of files}

  Unspecified choices: {count} (see REPRODUCTION_NOTES.md)

  Mode: {MODE} | Framework: {FRAMEWORK}

Mode-specific behavior

minimal (default): Core contribution only. Training loop only if contribution involves training. No data pipeline beyond Dataset skeleton.

full: Core contribution + full training loop + data pipeline + evaluation pipeline. More code, same citation rigor.

educational: Same as minimal but with extra inline comments explaining ML concepts, expanded walkthrough notebook with theory sections, and a PAPER_GUIDE.md that walks through the paper section by section.

Guardrails — always active

These apply at ALL stages. Read them if you haven't already:

guardrails/hallucination_prevention.md — the most important file in this skill

guardrails/scope_enforcement.md — what to implement and what to skip

guardrails/badly_written_papers.md — what to do when the paper is unclear

Knowledge base — consult as needed

Before implementing any of these components, read the corresponding knowledge file:

Transformer layers, attention, positional encoding → knowledge/transformer_components.md

Optimizers, LR schedules, batch size semantics → knowledge/training_recipes.md

Cross-entropy, contrastive loss, diffusion loss, ELBO → knowledge/loss_functions.md

Framework-specific pitfalls, notation mismatches → knowledge/paper_to_code_mistakes.md