SKILL.md

$28

Files not in the manifest (new session rollouts, new index files)

Files whose modification time is newer than ingested_at in the manifest

Use this mode for regular syncs.

Full Mode

Process everything regardless of manifest. Use after wiki-rebuild or if the user explicitly asks for a full re-ingest.

Codex Data Layout

Codex stores local artifacts under ~/.codex/.

~/.codex/

├── sessions/                          # Session rollout logs by date

│   └── YYYY/MM/DD/

│       └── rollout-<timestamp>-<id>.jsonl

├── archived_sessions/                 # Archived rollout logs

├── session_index.jsonl                # Lightweight index of thread id/name/updated_at

├── history.jsonl                      # Local transcript history (if persistence enabled)

├── config.toml                        # User config (contains history settings)

└── state_*.sqlite / logs_*.sqlite     # Runtime DBs (usually skip)

Key data sources ranked by value

session_index.jsonl — best inventory source for IDs, titles, and freshness

sessions/**/rollout-*.jsonl — rich structured transcript events

history.jsonl — useful fallback/timeline aid if enabled

Avoid ingesting SQLite internals unless the user explicitly asks.

Step 1: Survey and Compute Delta

Scan CODEX_HISTORY_PATH and compare against .manifest.json:

~/.codex/session_index.jsonl

~/.codex/sessions/**/rollout-*.jsonl

~/.codex/archived_sessions/** (optional; only if user asks for archived history)

~/.codex/history.jsonl (optional fallback)

Classify each file:

New — not in manifest

Modified — in manifest but file is newer than ingested_at

Unchanged — already ingested and unchanged

Report a concise delta summary before deep parsing.

Step 2: Parse Session Index First

session_index.jsonl typically has entries like:

{"id":"...","thread_name":"...","updated_at":"..."}

Use it to:

Build a canonical session inventory

Prioritize recent/high-signal sessions

Map rollout IDs to human-readable thread names

Step 3: Parse Rollout JSONL Safely

Each rollout-*.jsonl line is an event envelope with:

{

  "timestamp": "...",

  "type": "session_meta|turn_context|event_msg|response_item",

  "payload": { ... }

}

Extraction rules

Prioritize user intent and assistant-visible outputs

Favor response_item records with user/assistant message content

Use event_msg selectively for meaningful milestones; ignore pure telemetry

Treat session_meta as metadata (cwd, model, ids), not user knowledge

Skip/noise filters

Token accounting events

Tool plumbing with no semantic content

Raw command output unless it contains reusable decisions/patterns

Repeated plan snapshots unless they add novel decisions

Critical privacy filter

Rollout logs can include injected instructions, tool payloads, and sensitive text. Do not ingest verbatim system/developer prompts or secrets.

Remove API keys, tokens, passwords, credentials

Redact private identifiers unless relevant and approved

Summarize instead of quoting raw transcripts

Step 4: Cluster by Topic

Do not create one wiki page per session.

Group by stable topics across many sessions

Split mixed sessions into separate themes

Merge recurring concepts across dates/projects

Use cwd from metadata to infer project scope

Step 5: Distill into Wiki Pages

Route extracted knowledge using existing wiki conventions:

Project-specific architecture/process -> projects/<name>/...

General concepts -> concepts/

Recurring techniques/debug playbooks -> skills/

Tools/services -> entities/

Cross-session patterns -> synthesis/

For each impacted project, create/update projects/<name>/<name>.md (project name as filename, never _project.md).

Writing rules

Distill knowledge, not chronology

Avoid "on date X we discussed..." unless date context is essential

Add summary: frontmatter on each new/updated page (1-2 sentences, <= 200 chars)

Add confidence and lifecycle fields to every new page:

base_confidence: 0.42

lifecycle: draft

lifecycle_changed: <ISO date today>

Leave lifecycle unchanged on update.

Add provenance markers:

^[extracted] when directly grounded in explicit session content

^[inferred] when synthesizing patterns across events/sessions

^[ambiguous] when sessions conflict

Add/update provenance: frontmatter mix for each changed page

Step 6: Update Manifest, Log, and Index

Update .manifest.json

For each processed source file:

ingested_at, size_bytes, modified_at

source_type: codex_rollout | codex_index | codex_history

project: inferred project name (when applicable)

pages_created, pages_updated

Add/update a top-level project/session summary block:

{

  "project-name": {

    "source_path": "~/.codex/sessions/...",

    "last_ingested": "TIMESTAMP",

    "sessions_ingested": 12,

    "sessions_total": 40,

    "index_updated_at": "TIMESTAMP"

  }

}

Update special files

Update index.md and log.md:

- [TIMESTAMP] CODEX_HISTORY_INGEST sessions=N pages_updated=X pages_created=Y mode=append|full

**hot.md** — Read $OBSIDIAN_VAULT_PATH/hot.md (create from the template in wiki-ingest if missing). Update Recent Activity with a one-line summary — e.g. "Ingested 12 Codex sessions; surfaced recurring patterns in CLI tooling and shell scripting." Keep the last 3 operations. Update updated timestamp.

Privacy and Compliance

Distill and synthesize; avoid raw transcript dumps

Default to redaction for anything that looks sensitive

Ask the user before storing personal/sensitive details

Keep references to other people minimal and purpose-bound

Reference

See references/codex-data-format.md for field-level parsing notes and extraction guidance.

QMD Refresh After Vault Writes

QMD is a search index, not the source of truth. If $QMD_WIKI_COLLECTION is empty or unset, skip this step. Run it only after this skill has written or rewritten vault markdown. If QMD refresh fails, do not roll back the vault changes; report the QMD status separately.

Use $QMD_CLI if set; otherwise use qmd.

${QMD_CLI:-qmd} update

If the output says vectors are needed or embeddings may be stale, run:

${QMD_CLI:-qmd} embed

Verify the collection with either:

${QMD_CLI:-qmd} ls "$QMD_WIKI_COLLECTION"

or, when a specific page path is known:

${QMD_CLI:-qmd} get "qmd://$QMD_WIKI_COLLECTION/<page>.md" -l 5

Record one of:

QMD refreshed: update + embed + verified

QMD refreshed: update only + verified

QMD skipped: QMD_WIKI_COLLECTION unset

QMD skipped: qmd CLI unavailable

QMD failed: <short error summary>

codex-history-ingest

SKILL.md

Full Mode

Codex Data Layout

Key data sources ranked by value

Step 1: Survey and Compute Delta

Step 2: Parse Session Index First

Step 3: Parse Rollout JSONL Safely

Extraction rules

Skip/noise filters

Critical privacy filter

Step 4: Cluster by Topic

Step 5: Distill into Wiki Pages

Writing rules

Step 6: Update Manifest, Log, and Index

Update .manifest.json

Update special files

Privacy and Compliance

Reference

QMD Refresh After Vault Writes

Stop writing automation&scrapers

codex-history-ingest

SKILL.md

Full Mode

Codex Data Layout

Key data sources ranked by value

Step 1: Survey and Compute Delta

Step 2: Parse Session Index First

Step 3: Parse Rollout JSONL Safely

Extraction rules

Skip/noise filters

Critical privacy filter

Step 4: Cluster by Topic

Step 5: Distill into Wiki Pages

Writing rules

Step 6: Update Manifest, Log, and Index

Update .manifest.json

Update special files

Privacy and Compliance

Reference

QMD Refresh After Vault Writes

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers