# gnokee

> Memory infrastructure for personal AI — bi-temporal facts, honest
> contradictions, autonomous maintenance, real forgetting. MCP-native,
> multi-tenant, multilingual.

gnokee version: 0.4.0 (v0.5 in flight — see CHANGELOG.md).

This manifest follows the [llms.txt convention](https://llmstxt.org/) so
agents can discover gnokee's MCP surface, identity model, and red lines
without reading the full README. The file ships inside the PyPI sdist /
wheel and the in-repo `static/.well-known/llms.txt` path. Self-hosters
serve it directly from disk; gnokee itself does NOT publish a webserver.

## ADRs

- ADR index: https://github.com/omurlabs/gnokee/tree/main/docs/adr
- ADR-0001 Build on Graphiti: https://github.com/omurlabs/gnokee/blob/main/docs/adr/0001-build-on-graphiti.md
- ADR-0004 gnokee owns retrieval: https://github.com/omurlabs/gnokee/blob/main/docs/adr/0004-gnokee-owns-retrieval.md
- ADR-0009 Typed lab_record: https://github.com/omurlabs/gnokee/blob/main/docs/adr/0009-clinical-labs-need-structured-store.md
- ADR-0010 Typed med_record: https://github.com/omurlabs/gnokee/blob/main/docs/adr/0010-meds-share-lab-record-shape.md
- ADR-0015 Wiki export + llms.txt manifest: https://github.com/omurlabs/gnokee/blob/main/docs/adr/0015-wiki-export-llms-txt.md
- ADR-0016 Per-claim citation envelope: https://github.com/omurlabs/gnokee/blob/main/docs/adr/0016-per-claim-citation-envelope.md

## MCP tools

gnokee exposes its surface over the [Model Context Protocol](https://modelcontextprotocol.io/).
Every tool is multi-tenant — `tenant_id` is mandatory on every call;
gnokee NEVER reads across tenant boundaries.

- `gnokee_ingest_episode` — Store a fact / event / observation in
  bi-temporal memory. `valid_at` = world-time; `asserted_at` = when the
  consumer learned of it. Supports an idempotency key (`our_id`) so
  re-ingest is a no-op. Returns the episode UUID, any extracted fact
  UUIDs, and surfaced contradictions (gnokee NEVER auto-resolves —
  contradictions are surfaced for the host LLM to handle).
- `gnokee_recall` — Natural-language fact retrieval with bi-temporal
  pinning (`valid_at` world-time + `as_of` source-time). Returns
  Graphiti edge-fact statements, verbatim episode-body excerpts
  (currency / med-keyword signals), and an episode-fallback body lane
  for conversation-shape literals. Carries a `citations` map keyed
  `c1`, `c2`, … (ADR-0016) so the host LLM can emit inline `[cN]`
  markers backed by both bi-temporal axes.
- `gnokee_fact_provenance` — Fetch the original episode body and
  metadata behind a `fact_uuid` returned by `gnokee_recall`.
- `gnokee_lab_query` — Typed clinical-lab read against `lab_record`
  (ADR-0009). Ops: `latest | history | min | max | avg | count |
  abnormal`. Strictly read-only; rows carry both bi-temporal axes and
  `episode_uuid` provenance.
- `gnokee_med_query` — Typed medication-history read against
  `med_record` (ADR-0010). Ops: `active | history | allergies |
  switches`. Same provenance shape as `gnokee_lab_query`.
- `gnokee_wiki_export` — ADR-0015. Markdown view of a tenant's memory
  (`index.md` + `log.md` + `entities/<slug>.md`). Output is markdown
  bytes the host decides what to do with — gnokee NEVER writes to a
  consumer git repo or filesystem on its own.

## Identity & multi-tenancy

- `tenant_id` is MANDATORY on every read and every write.
- `group_id` partitions Graphiti — every Cypher predicate carries it.
- gnokee NEVER reads across tenants; cross-tenant correction
  (`corrects:` field) is REJECTED at the boundary.
- Postgres tables use row-level security (RLS) on `tenant_id`;
  defence-in-depth even against an in-process leak.

## Red lines (hard "do not" list)

These are normative — every PR is gated against them.

- **No LLM hosting.** The consumer supplies an OpenAI-compatible
  endpoint. gnokee NEVER paraphrases user-facing answers, NEVER
  synthesises citations, NEVER generates the answer prose itself.
- **No key holding.** Encrypted-body branch (v0.4+): consumer's KMS
  owns the DEK. gnokee stores ciphertext + nonce + `key_id` triple
  and surfaces it verbatim on recall. gnokee NEVER decrypts.
- **No auto-resolve.** Contradictions surface to the host LLM —
  gnokee NEVER picks a winner.
- **No supersession-by-deletion.** Closed facts stay in the store
  with `asserted_until` set + an audit trail. Forgetting is via
  tombstones / crypto erasure, NEVER silent overwrite.
- **No UI.** gnokee is a library + MCP server. Markdown bytes go
  back to the consumer; rendering is the consumer's problem.
- **No federation / ActivityPub / IPFS in core.** Separate adapter
  repo if ever — these would smuggle a network surface into a
  storage primitive.
- **No "knowledge engine" branding.** Pinecone trademark feel.

## Backends

- Postgres 17 + pgvector (fact + vector store).
- Neo4j 5 Community via Graphiti 0.29 (bi-temporal knowledge graph).
- HuggingFace TEI (bge-m3 1024-dim embeddings) or Ollama
  `/v1/embeddings` for Mac dev.
- Consumer-supplied OpenAI-compatible LLM endpoint (gpt-4o-mini by
  default; Ollama / LiteLLM / vLLM all work).

## Discoverability

- PyPI: https://pypi.org/project/gnokee/
- Source: https://github.com/omurlabs/gnokee
- Issues: https://github.com/omurlabs/gnokee/issues
- License: Apache-2.0
