Metadata-Version: 2.4
Name: cite-citadel
Version: 0.2.0
Summary: cite-citadel — an LLM-maintained, fully-cited personal wiki in the Open Knowledge Format, fed by a coding-agent CLI you already have logged in, with an MCP search server. Every fact is cited to its source; nothing is invented.
Project-URL: Homepage, https://github.com/MarkusNeusinger/cite-citadel
Project-URL: Repository, https://github.com/MarkusNeusinger/cite-citadel
Project-URL: Issues, https://github.com/MarkusNeusinger/cite-citadel/issues
Author: Markus Neusinger
License-Expression: MIT
License-File: LICENSE
Keywords: citations,knowledge-base,llm,markdown,mcp,okf,open-knowledge-format,provenance,wiki
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Documentation
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.12
Requires-Dist: mcp>=1.2
Requires-Dist: pyyaml>=6.0
Description-Content-Type: text/markdown

# **cite**-citadel

[![CI](https://img.shields.io/github/actions/workflow/status/MarkusNeusinger/cite-citadel/ci.yml?branch=main&label=CI)](https://github.com/MarkusNeusinger/cite-citadel/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/cite-citadel)](https://pypi.org/project/cite-citadel/)
[![Python versions](https://img.shields.io/pypi/pyversions/cite-citadel)](https://pypi.org/project/cite-citadel/)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](https://github.com/MarkusNeusinger/cite-citadel/blob/main/LICENSE)

> **A fortress of cited knowledge.** An LLM-maintained, fully-cited personal wiki —
> every fact is attested to its source, nothing is invented.

An LLM-maintained personal wiki in Google's [Open Knowledge Format](https://github.com/GoogleCloudPlatform/knowledge-catalog/blob/main/okf/SPEC.md) (OKF),
with an **MCP server** so an AI can search and read it — a KISS, pure-Python 3.12 take on Andrej
Karpathy's [LLM-Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).

Drop arbitrary files into `raw/` (markdown, code, JSON/CSV, PDF, PowerPoint/Word/Excel —
`.pptx`/`.docx`/`.xlsx` and legacy `.ppt`/`.doc`/`.xls` — even images, in any sub-folder). One
agentic CLI session per source folds it into a cross-linked OKF wiki under `wiki/` — **routing each
fact to the page it best fits** and splitting/merging pages as the corpus grows, rather than making
one page per file. Office files have their text extracted automatically; images are read *visually*;
a file too big for one context window is folded in over several passes; the same document in two
formats (`report.pdf` + `report.pptx`) is ingested once; and any source that can't be ingested is
recorded (with the reason) in `wiki/sources/index.md`. Every fact is cited back to its `raw/`
source, and the model uses **only** what is in `raw/`. An AI client then queries the synthesized
wiki over MCP instead of re-reading your notes.

The CLI is **`citadel`**; the PyPI package is **`cite-citadel`**. The `wiki/` directory **is** the
database — no SQLite, no vector store. Ingest runs through a **coding-agent CLI you already have**
(`claude`, `copilot`, or `gemini`), so it uses your existing subscription and **needs no API key** —
that usage is under your account and your provider's terms (see
[License & third-party tools](#license--third-party-tools)).

**Three guarantees that hold as the wiki grows** (full rules in
[`citadel/rules/schema.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/citadel/rules/schema.md)):

- **Stays organized** — ingest merges, splits, and deletes pages by fit; it never piles up one page
  per raw file.
- **Links keep working** — merges/renames repoint inbound cross-links; any dangling link fails
  `citadel lint` / `citadel check`.
- **Honest provenance** — raw facts are restated faithfully and cite their source as `[^sN]`. A fact
  the model adds from its own knowledge must be labeled `[^llmN]`, never disguised as a raw citation.

## Install

```bash
uv add cite-citadel            # add to a project
uv tool install cite-citadel   # or install a global `citadel` CLI
pip install cite-citadel       # or plain pip
```

## Quickstart

Ingest runs through a coding-agent CLI you already have — no API key, just your existing subscription.

1. **`citadel init my-wiki && cd my-wiki`** — scaffolds the workspace (the `citadel.toml` marker, a
   `.env`, and empty `raw/` + `wiki/`).
2. **Fill in the generated `.env`.** At minimum set the coding-agent CLI to shell out to —
   `CITADEL_LLM_CLI=claude | copilot | gemini` — which must be installed and logged in (no API key
   needed); optionally pin a model with `CITADEL_INGEST_MODEL`. Every other knob is documented
   inline in that same file.
3. **Drop any text-bearing files into `raw/`** — markdown, code, PDF, Office, images, in any sub-folder.
4. **`citadel ingest`** — one agent session per source folds it into the cross-linked, cited wiki.
5. **Use it** — `citadel search "caffeine"` (also `read`, `status`, `doctor`, `curate`, `view`,
   `lint`, `check`, `tags`) from the shell, or `citadel serve` to expose the wiki to any AI over MCP.
   **Everything the MCP server offers, the CLI offers too** — an AI without MCP access can drive
   citadel through equivalent shell commands.

> **Contributing?** Run from a checkout: `uv sync`, then the portable `uv run python -m citadel
> <subcommand>` (identical on Linux/macOS/Windows and needs no `.exe` — on Windows, antivirus can
> quarantine uv's generated `citadel.exe`).

## How it works

Three layers (Karpathy's split; [`citadel/rules/schema.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/citadel/rules/schema.md) has the
authoritative rules, which the ingest agent reads — referenced by path — every run):

1. **`raw/`** — immutable sources; ingest reads but never edits them.
2. **`wiki/`** — the LLM-owned OKF bundle: markdown pages with YAML frontmatter, routed **by kind**
   into `concepts/`, `objects/`, `systems/`, `persons/`, `organizations/`, `projects/`,
   `abbreviations/`, `misc/`, densely cross-linked, each fact carrying a citation. The reserved
   `index.md`, `log.md`, and `sources/index.md` are generated, not authored.
3. **[`citadel/rules/`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/citadel/rules/README.md)** — the schema/rules layer: `schema.md` (the
   format contract) + `core.md` (agent behavior) + per-lifecycle `tasks/`, per-file-type
   `formats/`, and agent-judged `genres/` briefs. Editing them changes how the wiki is built with
   **no code change**. The rules live in the package so a pip install carries them; the repo-root
   `SCHEMA.md`/`AGENT_INGEST.md` are just pointer stubs.

**Per-fact provenance** is the load-bearing rule. Every factual sentence ends with a GitHub-Flavored
Markdown footnote, defined in a trailing `## Sources` section that links to the originating `raw/`
file:

```markdown
Robusta has about twice the caffeine of Arabica.[^s1]

## Sources

[^s1]: [raw/coffee-guide.md](../../raw/coffee-guide.md) — coffee guide (ingested 2026-06-30)
```

This renders on GitHub, is trivially greppable, and needs zero custom tooling. A claim that can't be
cited is dropped, never invented; conflicting sources produce a `> [!CONTRADICTION]` callout. The
`wiki/` folder also opens **as-is** as an [Obsidian](https://obsidian.md) vault.

## Test corpora

Three synthetic corpora live under [`corpora/`](https://github.com/MarkusNeusinger/cite-citadel/tree/main/corpora), each ingestible on its own or all
together. The **showcase** is [`corpora/beverages/`](https://github.com/MarkusNeusinger/cite-citadel/tree/main/corpora/beverages) — a deliberately
overlapping **coffee + tea** corpus of 10 files in mixed styles (reference, prose, lab notes, FAQ,
brand blog) with facts that repeat, contradict, and hide in one place, plus one deliberately-false
sourced claim. Two more corpora stress the hardest guarantees:
[`corpora/counterfactual-atlas/`](https://github.com/MarkusNeusinger/cite-citadel/tree/main/corpora/counterfactual-atlas) is a coherent fictional world whose
facts contradict reality, graded that they appear **as stated, cited, never corrected**;
[`corpora/project-history/`](https://github.com/MarkusNeusinger/cite-citadel/tree/main/corpora/project-history) is a three-year programme ingested in dated
waves that drives **reconcile / delete / force** and grades temporal supersession, German→English,
and attributed opinions.

Each corpus ships a hidden answer key at `.claude/skills/verify-corpus/<name>/ground-truth.md`
(outside the corpus, so the ingest agent can never see it). The parameterized `verify-corpus` skill
(`verify-corpus <name>|all`) ingests a corpus into a throwaway sandbox and grades the result against
that key — an end-to-end test of the three guarantees.

**See the result without running anything.** Browse the generated showcase wiki on GitHub at
[`corpora/beverages/wiki/index.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/corpora/beverages/wiki/index.md) — GitHub renders the OKF pages natively, so the `[^sN]` citations,
cross-links, glossary, and `> [!CONTRADICTION]` callouts all show inline. For the richer, interactive
view — the cross-link graph, tags, and the cited raw sources embedded — open the **live demo** at
**[markusneusinger.github.io/cite-citadel](https://markusneusinger.github.io/cite-citadel/)**, the
offline single-file viewer regenerated from the showcase wiki on every push.

## MCP server

`citadel serve` exposes eight tools over stdio: `wiki_search`, `wiki_read`, `wiki_index`,
`wiki_sources`, `wiki_tags`, `wiki_validate`, `wiki_lint` (read-only), and `wiki_ingest` (the only
mutating one). Each carries MCP behavior annotations (`readOnlyHint` etc.) so a client can tell the
readers from the one mutating tool. Every MCP tool has a CLI counterpart — `citadel read`,
`citadel index`, `citadel sources`, `citadel lint`, … — so an AI without MCP access can do
everything through the CLI. Wire it into an MCP client (e.g. Claude Desktop):

```json
{
  "mcpServers": {
    "citadel": {
      "command": "citadel",
      "args": ["serve"],
      "env": { "CITADEL_LLM_CLI": "claude", "CITADEL_INGEST_MODEL": "sonnet" }
    }
  }
}
```

An AI can then `wiki_index()` to orient, `wiki_search(...)` to find pages, and `wiki_read(...)` to
pull full cited context — answering from your synthesized wiki instead of re-retrieving documents.

## Reference

- [`citadel/rules/README.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/citadel/rules/README.md) — index of the rules tree the ingest agent
  follows: [`schema.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/citadel/rules/schema.md) (structure, routing, and provenance rules),
  [`core.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/citadel/rules/core.md) (operational behavior), plus the `tasks/`, `formats/`, and
  `genres/` briefs.
- [`citadel/templates/env.example`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/citadel/templates/env.example) — every configuration knob
  (the `citadel init` `.env` template; the repo-root `.env.example` is a pointer stub).
- [`docs/karpathy-llm-wiki.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/docs/karpathy-llm-wiki.md) ·
  [`docs/okf-reference.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/docs/okf-reference.md) — the pattern and the format.
- [`docs/configuration.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/docs/configuration.md) — every `CITADEL_*` config knob.
- `CLAUDE.md` — architecture notes for contributors.
- [`CONTRIBUTING.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/CONTRIBUTING.md) ·
  [`CHANGELOG.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/CHANGELOG.md) ·
  [`SECURITY.md`](https://github.com/MarkusNeusinger/cite-citadel/blob/main/SECURITY.md)

## License & third-party tools

cite-citadel is released under the [MIT License](https://github.com/MarkusNeusinger/cite-citadel/blob/main/LICENSE).

**Not affiliated.** cite-citadel is an independent project — not affiliated with, endorsed by, or
sponsored by Anthropic, GitHub/Microsoft, or Google. "Claude", "GitHub Copilot", and "Gemini" are
their respective owners' trademarks, named only to identify the user-supplied CLI. Full disclaimer:
[NOTICE.md](https://github.com/MarkusNeusinger/cite-citadel/blob/main/NOTICE.md).

**Bring your own CLI — your account, your provider's terms.** Ingest runs *your* authenticated
coding-agent CLI under *your* account, and that usage is governed by **that provider's** terms, not
by cite-citadel:
[Anthropic Consumer Terms](https://www.anthropic.com/legal/consumer-terms) /
[Commercial Terms](https://www.anthropic.com/legal/commercial-terms),
the [GitHub Copilot product-specific terms](https://docs.github.com/en/site-policy/github-terms/github-copilot-product-specific-terms),
and the [Gemini Code Assist / Gemini API terms](https://developers.google.com/gemini-code-assist/resources/terms-of-service).
cite-citadel calls the official binary only — it does **not** proxy, store, or transmit your
credentials. Honest caveat: heavy, unattended, or CI ingest against a **consumer subscription** may
hit rate limits or a provider's automated-use expectations — for that scale prefer the tier the
provider designates for programmatic use.

**Your wiki is yours.** The providers assign output rights to you, and cite-citadel claims nothing
over `wiki/` content — publish the generated wiki freely.
