A local-first TUI agent for any OpenAI-compatible server. On llama.cpp it constrains tool calls with a grammar, grounds answers in cited sources, and reports a deterministic grounding score you can reproduce.
๐ English ยท ็ฎไฝไธญๆ
Scribe generates a GBNF grammar from your tool schemas and passes it with the request. llama.cpp then masks every token that would violate it, so the output can only be a well-formed call: a real tool name, the required arguments, valid JSON. A malformed call has zero probability of being sampled, not just a low one. If a model emits a bad call without the grammar, that one request is retried with it attached.
Constrained decoding (GBNF) is a known llama.cpp feature; what's specific here is generating the grammar from the tool schemas and using it as an automatic repair path. It constrains the call's form โ the model can still pick the wrong tool, it just can't emit a malformed one.In grounded Q&A, retrieved chunks are passed as numbered sources and the
prompt requires a [n] citation per claim. Conflicting sources get a
[CONTRADICTION] tag; a question the sources don't cover gets a
refusal instead of an invented answer.
scribe bench computes a Source-Presence Index (SPI)
over a checksum-locked held-out suite: citation coverage on answerable tasks,
correct refusal on impossible ones. It's deterministic โ no LLM judge โ so it
reproduces. Gemma 4 12B scores SPI 1.00 on the shipped suite; run it on yours.
Scope, stated plainly: GBNF and constrained decoding date to 2023 in llama.cpp and exist elsewhere as "structured outputs" โ the technique isn't new, the integration is. Points 1โ3 hold on a llama.cpp backend; Ollama, LM Studio and cloud APIs fall back to a best-effort text parser without the hard guarantee.
Point it at any OpenAI-compatible endpoint โ local llama.cpp, Ollama, LM Studio, or remote. scribe discover finds them.
/code with a destructive-command gate, a Python AST gate, a bubblewrap sandbox, and git checkpoint / rollback.
SME (Semantic Memory Engine) recalls your last session, and a WorldModel persona keeps the agent's identity stable across runs.
scribe wiki distill curates sessions into OKF markdown โ frontmatter + links in your git repo. SME/RAG are just derived indexes over the files. Portable, open, no SDK.
A dark, VSCode-style web studio for writing books with your local model: three resizable panes, an integrated terminal, model-drafted chapters, and EPUB / PDF / Markdown export.
FTS5 + local embeddings (multilingual-e5) fused with RRF โ exact identifiers and paraphrase both retrieve.
Deterministic ORORO session traces and a machine-readable scribe status --json contract.
A terminal interface (Rich + Textual) and a streaming FastAPI web chat with login.
scribe init gives any directory its own isolated RAG / SME stores.
scribe compare A/B tests two models on one prompt without telling you which is which โ pick a local model honestly.
Deep-research, writer, and wiki-memory skills โ drop in more as SKILL.md modules.
# 1. Clone & install
git clone https://github.com/pedjaurosevic/scribe-ai.git
cd scribe-ai
./scripts/install.sh # installs the package + config + ~/scribe-workspace
# 2. Start your llama-server (or use any OpenAI-compatible endpoint)
./scripts/start-server.sh
# 3. Find your model server, then chat
scribe discover # scan local ports (+ --tailscale)
scribe chat # โฆ or: scribe web โ http://localhost:8765
# 4. See the grounding number for yourself
scribe bench # judge-scored fitness + deterministic SPI
Configure the endpoint in ~/.config/scribe/config.toml or via
SCRIBE_BASE_URL / SCRIBE_MODEL env vars.