minimal-run-and-audit

Name: minimal-run-and-audit
Author: lllllllama

RigorPilot trusted execution and reporting skill for README-first deep learning repo reproduction. Use when the task is specifically to capture or normalize…

INSTALLATION

npx skills add https://github.com/lllllllama/ai-paper-reproduction-skill --skill minimal-run-and-audit

Run in your project or agent environment. Adjust flags if your CLI version differs.

SKILL.md

minimal-run-and-audit

Use the shared operating principles in

../../references/agent-operating-principles.md; this skill should make run

evidence auditable without turning every command into a rigid protocol.

When to apply

After a reproduction target and setup plan exist.

When the main skill needs execution evidence and normalized outputs.

When a smoke test, documented inference run, documented evaluation run, or other short non-training verification is appropriate.

When the user already knows what command should be attempted and wants execution plus reporting only.

When not to apply

During initial repo scanning.

When environment or assets are still undefined enough to make execution meaningless.

When the task is a literature lookup rather than repository execution.

When the user is still deciding which reproduction target should count as the main run.

Clear boundaries

This skill owns normalized reporting for an attempted command.

It may receive execution evidence from the main skill or a thin helper.

It does not choose the overall target on its own.

It does not perform broad paper analysis.

It does not own training startup, resume, or long-running training state.

It should not normalize risky code edits into acceptable practice.

It must not hide changes that alter evaluation, preprocessing, checkpoints,

metrics, or other scientific meaning.

Input expectations

selected reproduction goal

runnable commands or smoke commands

environment and asset assumptions

optional patch metadata

Output expectations

execution result summary

standardized repro_outputs/ files

SCIENTIFIC_CHANGELOG.md for changed scientific meaning and evidence status

COMPARABILITY_REPORT.md for README/paper/baseline comparability

clear distinction between verified, partial, and blocked states

PATCHES.md when repo files changed

Notes

Use references/reporting-policy.md, ../../references/research-rigor-principles.md, scripts/run_command.py, and scripts/write_outputs.py.

minimal-run-and-audit

SKILL.md

minimal-run-and-audit

When to apply

When not to apply

Clear boundaries

Input expectations

Output expectations

Notes

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers