SKILL.md

$28

The wiki lives at the path configured via OBSIDIAN_VAULT_PATH in .env.

Layer 3: The Schema (this skill + config)

The rules governing how the wiki is structured — categories, conventions, page templates, and operational workflows. The schema tells the LLM how to maintain the wiki.

Wiki Organization

The vault has two levels of structure: categories (what kind of knowledge) and projects (where the knowledge came from).

Tracks every source file that has been ingested — path, timestamps, what wiki pages it produced. This is the backbone of the delta system. See the wiki-status skill for the full schema.

The manifest enables:

Delta computation — what's new or modified since last ingest

Append mode — only process the delta, not everything

Audit — which source produced which wiki page

Staleness detection — source changed but wiki page hasn't been updated

Page Template

When creating a new wiki page, use this structure:

---

title: Page Title

category: concepts

tags: [ml, architecture]

aliases: [alternate name]

relationships:

  - target: "[[concepts/related-concept]]"

    type: extends

sources: [papers/attention.pdf]

summary: One or two sentences, ≤200 chars, so a reader (or another skill) can preview this page without opening it.

provenance:

  extracted: 0.72

  inferred: 0.25

  ambiguous: 0.03

base_confidence: 0.65

lifecycle: draft

lifecycle_changed: 2024-03-15

tier: supporting

created: 2024-03-15T10:30:00Z

updated: 2024-03-15T10:30:00Z

---

# Page Title

One-paragraph summary of what this page covers.

## Key Ideas

- The source's central claim, paraphrased directly.

- A generalization the source implies but doesn't state outright. ^[inferred]

- A figure two sources disagree on. ^[ambiguous]

Use [[wikilinks]] to connect to related pages.

## Open Questions

Things that are unresolved or need more sources.

## Sources

- [[references/attention-is-all-you-need]] — Original paper

Provenance Markers

Every claim on a wiki page has one of three provenance states. Mark them inline so the reader (and future ingest passes) can tell signal from synthesis.

State

Marker

Meaning

Extracted

(no marker — default)

A paraphrase of something a source actually says.

Inferred

^[inferred] suffix

An LLM-synthesized claim — a connection, generalization, or implication the source doesn't state directly.

Ambiguous

^[ambiguous] suffix

Sources disagree, or the source is unclear.

Example:

- Transformers parallelize across positions, unlike RNNs.

- This is why they scale better on modern hardware. ^[inferred]

- GPT-4 was trained on roughly 13T tokens. ^[ambiguous]

Why this syntax:

^[...] is footnote-adjacent in Obsidian — renders cleanly and never collides with [[wikilinks]].

Inline (suffix) so a single bullet stays a single bullet.

Default = extracted means existing pages without markers stay valid.

Frontmatter summary: Optionally surface the rough mix at the page level so the user can scan for speculation-heavy pages without reading them:

provenance:

  extracted: 0.72   # rough fraction of sentences/bullets with no marker

  inferred: 0.25

  ambiguous: 0.03

These are best-effort numbers written by the ingest skill at create/update time. wiki-lint recomputes them and flags drift. The block is optional — pages without it are treated as fully extracted by convention.

Typed Relationships

Plain [[wikilinks]] in page bodies carry no semantic weight — they indicate "related to" but not how. The optional relationships: frontmatter block adds typed, directional edges to the knowledge graph.

The relationships: block

relationships:

  - target: "[[Transformer Architecture]]"

    type: extends

  - target: "[[LSTM]]"

    type: contradicts

  - target: "[[Attention Mechanism]]"

    type: implements

Each entry has two required fields:

target — a wikilink (using the same format as OBSIDIAN_LINK_FORMAT) to the related page

type — one of the allowed semantic types below

Allowed relationship types

Type

Meaning

Example

extends

This page builds on or generalises the target

GPT extends Transformer Architecture

implements

This page is a concrete realisation of the target concept

BERT implements Masked Language Modelling

contradicts

This page's claims conflict with or refute the target

Evidence A contradicts Evidence B

derived_from

This page is based on or adapted from the target

Fine-tuning is derived from Transfer Learning

uses

This page depends on or relies on the target

RAG uses Vector Databases

replaces

This page supersedes or deprecates the target

GPT-4 replaces GPT-3

related_to

Catch-all: related but no stronger directional type applies

Concept A is related to Concept B

Rules

Optional field — omit the block entirely if no typed relationships are known. Untagged wikilinks remain valid and are treated as related_to by wiki-export.

Don't duplicate — if [[foo]] already appears as an inline wikilink, the relationships: entry just enriches it with a type; it is not a second link.

Direction matters — the page declaring the entry is the source; target is the destination. Only declare relationships from this page's perspective.

Don't fabricate — only add a typed entry when the source material makes the relationship direction and type clear. When in doubt, use related_to or omit.

Skills that read relationships:: wiki-export (emits typed edges), cross-linker (writes typed entries when inferring links), wiki-query (may surface type in answers).

Confidence and Lifecycle

Every page carries two orthogonal trust signals plus an optional supersession link.

Required fields

base_confidence: 0.65          # [0.0, 1.0] — time-independent quality estimate. Stored once, recomputed on content change.

lifecycle: draft               # draft | reviewed | verified | disputed | archived

lifecycle_changed: 2024-03-15  # ISO date of last state transition

# lifecycle_reason: "..."      # optional free-text — why the state changed; surfaced by wiki-query

# superseded_by: "[[new-page]]" # wikilink; only when lifecycle=archived

lifecycle_reason and superseded_by are optional. Never fabricate them.

Confidence formula

base_confidence = source_count_score * 0.5 + source_quality_score * 0.5

source_count_score   = min(distinct_source_ids / 3, 1.0)

source_quality_score = avg(quality score per distinct source_id)

Source-quality scores (use the highest-matching bucket):

Bucket

Score

Examples

paper

1.0

arXiv, conference proceedings

official

0.9

*.gov, vendor docs

documentation

0.85

well-maintained third-party docs

book

0.8

books, technical references

repository

0.75

GitHub READMEs, codebases

blog

0.55

personal blogs

session_transcript

0.5

conversation history

forum

0.4

Stack Overflow, HN, Reddit

unknown

0.4

catch-all

llm_generated

0.3

LLM self-reflections

**A source_id** is a stable per-source identifier — prevents counting three copies of the same blog as three distinct sources:

Source type

source_id rule

Academic paper

DOI > arXiv ID > <author>-<year>-<slug>

GitHub repo

github.com/<owner>/<repo>

Documentation site

<canonical-host>/<product>

Blog post

<host>/<author>

Session transcript

<agent>/<session-id>

Other

<canonical-url>

Per-skill defaults (ingest skills compute this automatically):

Skill

base_confidence

lifecycle

ingest-url

0.17 + 0.5 × classify(url)

draft

wiki-ingest (single doc)

per-source classifier

draft

wiki-ingest (multi-doc)

min(N/3,1)×0.5 + avg_q×0.5

draft

wiki-research

varies, often 0.85+

draft

wiki-capture

0.42

draft

*-history-ingest

0.42

draft

wiki-update

0.59

draft

wiki-synthesize

min(input_pages.base_confidence)

draft

data-ingest

0.37

draft

Lifecycle state machine

Five states. **stale is not a state** — it is a computed overlay: is_stale = (today − updated) > 90 days.

State

Entered by

Notes

draft

Any ingest skill on first write

Default for all new pages

reviewed

Human edit only

verified

Human edit only

Time alone never demotes verified pages

disputed

Manual edit only

Overrides every state except archived in display

archived

Manual edit, or ingest skill setting superseded_by

Terminal

Only ingest skills set draft. All other transitions require a human editor. Update lifecycle_changed whenever the state changes.

Importance Tiering

The tier: field controls which pages get updated on each ingest pass and their priority in retrieval. As wikis grow, re-reading every page on every ingest wastes tokens — tiering lets ingest and query skills focus effort where it matters most.

Three tiers

Tier

Meaning

Ingest behavior

Query priority

core

Load-bearing pages — many other pages depend on them (high incoming-link count or bridge position). Always worth updating.

Always update if the source is even marginally relevant

Surfaced first in index and full-read passes

supporting (default)

Standard wiki pages with moderate connectivity

Update when the source has clear new claims for this page

Standard priority

peripheral

Low-connectivity pages — rarely linked, narrowly scoped

Skip unless the source is primarily about this topic

Last resort; skipped when trimming to context budget

Assignment rules

New pages: default to tier: supporting

**Promote to core: when a page accumulates ≥5 incoming wikilinks or** is flagged as a bridge by wiki-status insights mode

**Demote to peripheral:** when a page has ≤1 incoming link and hasn't been updated in 90+ days

Human override always wins — edit tier: manually to lock a page at any level

Existing pages without tier: are treated as supporting (backward compatible — no migration needed)

Who manages tier

wiki-ingest reads tier: to decide whether to update a page on the current pass

wiki-query uses tier: to order candidates in the index pass and trim to context budget

wiki-status insights mode computes graph metrics and suggests tier assignments — it never writes them automatically

wiki-lint flags missing tier: on newly created pages (Phase 2 enforcement, same timeline as base_confidence)

Retrieval Primitives

Reading the vault is the dominant cost of every read-side skill. Use the cheapest primitive that can answer the question and escalate only when the cheaper one is insufficient. Any skill that needs content from the vault should follow this table rather than jumping straight to full-page reads.

Need

Primitive

Relative cost

Does a page exist? What's its title/category/tags?

Read index.md; Grep frontmatter blocks (scope with a pattern that targets ^--- blocks at file heads)

Cheapest

1–2 sentence preview of a page

Read the summary: field in its frontmatter

Cheap

A specific claim or section inside a page

Grep -A <n> -B <n> "<term>" <file> — returns only the matching lines plus context

Medium

Whole-page content

Read <file>

Expensive — last resort

Relationships across pages

Grep "\[\[.*?\]\]" across the vault, or walk wikilinks from a known page

Case-by-case

The rule: escalate only when the cheaper primitive can't answer the question. If you can answer from summary: fields alone, don't read page bodies. If a grepped section with -A 10 -B 2 gives you the claim, don't read the whole page. A 500-line page opened to read 15 lines is 485 lines of wasted tokens.

Why this matters: a 20-page vault lets you get away with full-vault scans. A 200-page vault does not. The primitives above are how the skills framework scales to large vaults without a database.

Skills that consume this table: wiki-query, cross-linker, wiki-lint, wiki-status (insights mode). Any new skill that reads the vault should cite this section rather than reinvent the pattern.

QMD Index Freshness

QMD is an optional search index layered on top of the vault. The markdown vault is the source of truth. Any skill that writes wiki markdown should refresh QMD after the vault write completes, but only when QMD_WIKI_COLLECTION is configured and the local QMD transport is available. If QMD refresh fails, keep the vault changes and report the QMD status separately.

Use the cheapest verification path that proves the new content is visible: qmd update, qmd embed only if vectors are stale or missing, then a targeted qmd get or qmd ls check for one written page or the collection root. Read-only skills should not refresh QMD.

Core Principles

Compile, don't retrieve. The wiki is pre-compiled knowledge. When you ingest a source, update every relevant page — don't just create a summary of the source.

Compound over time. Each ingest should make the wiki smarter, not just bigger. Merge new information into existing pages, resolve contradictions, strengthen cross-references.

Provenance matters. Every claim should trace to a source. When updating a page, note which source prompted the update.

Mark inferences. Default sentences are extracted. Mark synthesized claims with ^[inferred] and contested claims with ^[ambiguous]. A wiki that hides its guessing rots silently; one that marks it stays trustworthy.

Human curates, LLM maintains. The human decides what sources to add and what questions to ask. The LLM handles the bookkeeping — updating cross-references, maintaining consistency, noting contradictions.

Obsidian is the IDE. The user browses and explores the wiki in Obsidian. Everything must be valid Obsidian markdown with working wikilinks.

Link Format

All internal links connecting wiki pages are controlled by OBSIDIAN_LINK_FORMAT from the resolved config (default: wikilink).

Setting

Syntax

Example

wikilink (default)

[[path/to/page]] or [[path/to/page|display text]]

[[concepts/foo|foo]]

markdown

[display text](relative/path.md)

[foo](../concepts/foo.md)

Generating markdown-format links

When OBSIDIAN_LINK_FORMAT=markdown:

Compute the path from the current file's directory to the **target .md file** using .. to climb up as needed.

Use the page title or a natural phrase as display text.

Always include the .md extension.

Current file

Target

Relative link

index.md

concepts/foo.md

[foo](concepts/foo.md)

concepts/foo.md

entities/bar.md

[bar](../entities/bar.md)

projects/my-project/my-project.md

concepts/foo.md

[foo](../../concepts/foo.md)

projects/my-project/concepts/arch.md

entities/bar.md

[bar](../../../entities/bar.md)

The [[path\|display text]] wikilink form maps to [display text](relative/path.md) in Markdown mode.

Scope: this setting affects only newly written or updated links. Existing vault content is never automatically migrated — users who want to convert old links can run the cross-linker or wiki-lint skill.

Every write skill reads OBSIDIAN_LINK_FORMAT from config before generating links and applies the correct format.

Config Resolution Protocol

**All skills must resolve config using this algorithm — do not hard-code .env or ~/.obsidian-wiki/config directly.** This ensures single-vault, multi-vault, project-local, and VPS setups all work correctly.

Resolution order

Walk up from CWD — look for a .env file in the current directory, then each parent, up to $HOME. Stop at the first .env that contains OBSIDIAN_VAULT_PATH.

Global config — if no local .env found, read ~/.obsidian-wiki/config.

Prompt setup — if neither exists, tell the user: "No config found. Run wiki-setup to initialize your wiki."

find_config() {

  dir="$PWD"

  while [[ "$dir" != "$HOME" &#x26;&#x26; "$dir" != "/" ]]; do

    [[ -f "$dir/.env" ]] &#x26;&#x26; grep -q "OBSIDIAN_VAULT_PATH" "$dir/.env" &#x26;&#x26; { echo "$dir/.env"; return; }

    dir="$(dirname "$dir")"

  done

  [[ -f "$HOME/.obsidian-wiki/config" ]] &#x26;&#x26; { echo "$HOME/.obsidian-wiki/config"; return; }

  echo ""

}

Vault-scoped state

Skills that write runtime state (e.g. daily-update) must scope that state to the resolved vault, not to a global path. Use:

VAULT_ID=$(echo "$OBSIDIAN_VAULT_PATH" | md5sum 2>/dev/null || md5 -q - <<< "$OBSIDIAN_VAULT_PATH" | cut -c1-8)

STATE_DIR="$HOME/.obsidian-wiki/state/$VAULT_ID"

Standard "Before You Start" block

Every skill's setup section should read:

Resolve config — follow the Config Resolution Protocol in llm-wiki/SKILL.md. Walk up from CWD for .env, fall back to ~/.obsidian-wiki/config, else prompt setup. This gives OBSIDIAN_VAULT_PATH and any tool-specific path overrides.

Environment Variables

The wiki is configured through environment variables (see .env.example). The only required variable is the vault path — everything else has sensible defaults.

OBSIDIAN_VAULT_PATH — Where the wiki lives (required)

OBSIDIAN_SOURCES_DIR — Where raw source documents are

OBSIDIAN_CATEGORIES — Comma-separated list of categories

CLAUDE_HISTORY_PATH — Where to find Claude conversation data

CODEX_HISTORY_PATH — Where to find Codex session data

HERMES_HOME — Where to find Hermes agent data

OPENCLAW_HOME — Where to find OpenClaw data

COPILOT_HISTORY_PATH — Where to find Copilot session data

OBSIDIAN_LINK_FORMAT — Internal link syntax: wikilink (default) or markdown

WIKI_TOKEN_WARN_THRESHOLD — Emit a warning in wiki-status when the full-wiki token estimate exceeds this value (default: 100000). Set to 0 to disable. See wiki-status for the token footprint report.

WIKI_STAGED_WRITES — When true, all LLM-written pages go to _staging/<category>/ for human review before promotion. See wiki-setup and wiki-stage-commit for details.

No API keys are needed — the agent running these skills already has LLM access built in.

Modes of Operation

The wiki supports three ingest modes:

Mode

When to use

What happens

Append

Small delta, incremental updates

Compute delta via manifest, ingest only new/modified sources

Rebuild

Major drift, fresh start needed

Archive current wiki to _archives/, clear, reprocess all sources

Restore

Need to go back

Bring back a previous archive

Use wiki-status to see the delta and get a recommendation. Use wiki-rebuild for archive/rebuild/restore operations.

Reference

For details on specific operations, see the companion skills:

wiki-status — Audit what's ingested, compute delta, recommend append vs rebuild

wiki-rebuild — Archive current wiki, rebuild from scratch, or restore from archive

wiki-ingest — Distill source documents into wiki pages

claude-history-ingest — Ingest Claude conversation history

codex-history-ingest — Ingest Codex CLI session history

data-ingest — Ingest any raw text data

wiki-query — Answer questions against the wiki

wiki-lint — Audit and maintain wiki health

wiki-setup — Initialize a new vault

llm-wiki

SKILL.md

Layer 3: The Schema (this skill + config)

Wiki Organization

Categories

Projects

Special Files

index.md

log.md

.manifest.json

Page Template

Provenance Markers

Typed Relationships

The relationships: block

Allowed relationship types

Rules

Confidence and Lifecycle

Required fields

Confidence formula

Lifecycle state machine

Importance Tiering

Three tiers

Assignment rules

Who manages tier

Retrieval Primitives

QMD Index Freshness

Core Principles

Link Format

Generating markdown-format links

Config Resolution Protocol

Resolution order

Vault-scoped state

Standard "Before You Start" block

Environment Variables

Modes of Operation

Reference

Stop writing automation&scrapers

llm-wiki

SKILL.md

Layer 3: The Schema (this skill + config)

Wiki Organization

Categories

Projects

Special Files

index.md

log.md

.manifest.json

Page Template

Provenance Markers

Typed Relationships

The relationships: block

Allowed relationship types

Rules

Confidence and Lifecycle

Required fields

Confidence formula

Lifecycle state machine

Importance Tiering

Three tiers

Assignment rules

Who manages tier

Retrieval Primitives

QMD Index Freshness

Core Principles

Link Format

Generating markdown-format links

Config Resolution Protocol

Resolution order

Vault-scoped state

Standard "Before You Start" block

Environment Variables

Modes of Operation

Reference

Let your agent run on any real-world website

Related skills

Stop writing automation&scrapers