Scribe.

A local-first TUI agent for any OpenAI-compatible server. On llama.cpp it constrains tool calls with a grammar, grounds answers in cited sources, and reports a deterministic grounding score you can reproduce.

๐ŸŒ English ยท ็ฎ€ไฝ“ไธญๆ–‡

How it works

1Tool calls are grammar-constrained

Scribe generates a GBNF grammar from your tool schemas and passes it with the request. llama.cpp then masks every token that would violate it, so the output can only be a well-formed call: a real tool name, the required arguments, valid JSON. A malformed call has zero probability of being sampled, not just a low one. If a model emits a bad call without the grammar, that one request is retried with it attached.

Constrained decoding (GBNF) is a known llama.cpp feature; what's specific here is generating the grammar from the tool schemas and using it as an automatic repair path. It constrains the call's form โ€” the model can still pick the wrong tool, it just can't emit a malformed one.

2Answers cite sources, or decline

In grounded Q&A, retrieved chunks are passed as numbered sources and the prompt requires a [n] citation per claim. Conflicting sources get a [CONTRADICTION] tag; a question the sources don't cover gets a refusal instead of an invented answer.

Retrieval is hybrid: SQLite FTS5 (exact terms, identifiers) and vector search (paraphrase) fused with Reciprocal Rank Fusion, so both kinds of match surface.

3Grounding has a number

scribe bench computes a Source-Presence Index (SPI) over a checksum-locked held-out suite: citation coverage on answerable tasks, correct refusal on impossible ones. It's deterministic โ€” no LLM judge โ€” so it reproduces. Gemma 4 12B scores SPI 1.00 on the shipped suite; run it on yours.

The suite is small and authored in-repo: treat it as a regression gate, not an independent benchmark. The point is that the metric is reproducible and runs on your own model and data.

Scope, stated plainly: GBNF and constrained decoding date to 2023 in llama.cpp and exist elsewhere as "structured outputs" โ€” the technique isn't new, the integration is. Points 1โ€“3 hold on a llama.cpp backend; Ollama, LM Studio and cloud APIs fall back to a best-effort text parser without the hard guarantee.

What it does

Universal LLM adapter

Point it at any OpenAI-compatible endpoint โ€” local llama.cpp, Ollama, LM Studio, or remote. scribe discover finds them.

Safe code mode

/code with a destructive-command gate, a Python AST gate, a bubblewrap sandbox, and git checkpoint / rollback.

Cross-session memory

SME (Semantic Memory Engine) recalls your last session, and a WorldModel persona keeps the agent's identity stable across runs.

Open Knowledge Format

scribe wiki distill curates sessions into OKF markdown โ€” frontmatter + links in your git repo. SME/RAG are just derived indexes over the files. Portable, open, no SDK.

Book Studio (web)

A dark, VSCode-style web studio for writing books with your local model: three resizable panes, an integrated terminal, model-drafted chapters, and EPUB / PDF / Markdown export.

Hybrid RAG over your docs

FTS5 + local embeddings (multilingual-e5) fused with RRF โ€” exact identifiers and paraphrase both retrieve.

Observability

Deterministic ORORO session traces and a machine-readable scribe status --json contract.

Rich TUI & web UI

A terminal interface (Rich + Textual) and a streaming FastAPI web chat with login.

Project vaults

scribe init gives any directory its own isolated RAG / SME stores.

Blind compare

scribe compare A/B tests two models on one prompt without telling you which is which โ€” pick a local model honestly.

Modular skills

Deep-research, writer, and wiki-memory skills โ€” drop in more as SKILL.md modules.

Quick start

# 1. Clone & install
git clone https://github.com/pedjaurosevic/scribe-ai.git
cd scribe-ai
./scripts/install.sh        # installs the package + config + ~/scribe-workspace

# 2. Start your llama-server (or use any OpenAI-compatible endpoint)
./scripts/start-server.sh

# 3. Find your model server, then chat
scribe discover             # scan local ports (+ --tailscale)
scribe chat                 # โ€ฆ or: scribe web  โ†’ http://localhost:8765

# 4. See the grounding number for yourself
scribe bench                # judge-scored fitness + deterministic SPI

Configure the endpoint in ~/.config/scribe/config.toml or via SCRIBE_BASE_URL / SCRIBE_MODEL env vars.

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ SCRIBE TUI / WEB โ”‚ โ”‚ Rich ยท Textual ยท FastAPI โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ CORE KERNEL โ”‚ โ”‚ Session ยท Skills ยท WorldModel ยท Traces โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ LLM ADAPTER LAYER โ”‚ โ”‚ OpenAI-compatible ยท GBNF tool grammar โ”‚ โ”‚ ยท reasoning gate ยท discovery โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ MEMORY LAYER โ”‚ โ”‚ SME (cross-session) ยท hybrid RAG (RRF) โ”‚ โ”‚ FTS5 + vectors ยท grounding โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ TOOLS LAYER โ”‚ โ”‚ web ยท fs ยท bash ยท sandbox ยท checkpoint โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ QUALITY GATE โ”‚ โ”‚ scribe bench ยท SPI ยท fitness suite โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Philosophy