Metadata-Version: 2.4
Name: gnokee
Version: 0.9.0
Summary: Memory infrastructure for personal AI — bi-temporal facts, honest contradictions, autonomous maintenance, real forgetting. MCP-native, multi-tenant, multilingual.
Project-URL: Homepage, https://github.com/gnokeelabs/gnokee
Project-URL: Source, https://github.com/gnokeelabs/gnokee
Project-URL: Issues, https://github.com/gnokeelabs/gnokee/issues
Author: gnokeelabs
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: ai,bi-temporal,graphiti,knowledge-graph,mcp,memory,personal-ai
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Database :: Front-Ends
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: asyncpg>=0.31
Requires-Dist: graphiti-core<0.30,>=0.29
Requires-Dist: httpx>=0.28
Requires-Dist: langdetect>=1.0
Requires-Dist: neo4j<7,>=5.20
Requires-Dist: openai>=2.0
Requires-Dist: pgvector>=0.4
Requires-Dist: pydantic-settings>=2.5
Requires-Dist: pydantic<3,>=2.13
Requires-Dist: structlog>=25.0
Provides-Extra: core
Provides-Extra: dev
Requires-Dist: hatchling>=1.29; extra == 'dev'
Requires-Dist: mypy>=2.0; extra == 'dev'
Requires-Dist: pre-commit>=4; extra == 'dev'
Requires-Dist: pynacl>=1.5; extra == 'dev'
Requires-Dist: pytest-asyncio>=1.3; extra == 'dev'
Requires-Dist: pytest>=9; extra == 'dev'
Requires-Dist: pyyaml>=6.0; extra == 'dev'
Requires-Dist: ruff>=0.15; extra == 'dev'
Requires-Dist: tach>=0.30; extra == 'dev'
Requires-Dist: testcontainers>=4.14; extra == 'dev'
Provides-Extra: ingest
Provides-Extra: mcp
Requires-Dist: mcp<2.0,>=1.27; extra == 'mcp'
Provides-Extra: retrieval
Description-Content-Type: text/markdown

# gnokee

> Memory infrastructure for personal AI — bi-temporal facts, honest contradictions, autonomous maintenance, real forgetting. MCP-native, multi-tenant, multilingual.

[![PyPI](https://img.shields.io/pypi/v/gnokee.svg)](https://pypi.org/project/gnokee/)
[![npm](https://img.shields.io/npm/v/gnokee.svg)](https://www.npmjs.com/package/gnokee)
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)

**Status:** v0.7 — v0.1 core surface (ingest + recall + MCP) plus v0.2 typed clinical-data reads (`gnokee_lab_query` / `gnokee_med_query`), v0.3 lite-mode ingest for high-cadence sources, v0.4 encrypted-body branch (`content xor encrypted_body`; gnokee never holds keys), v0.5 read-side hardening (declarative `gnokee_query` envelope, per-claim citations, recency-proxy confidence, Obsidian adapter, refusal posture), v0.6 ingest hardening (storage-first ingest + adaptive TPM-aware retry, ADR-0023; recall@5 17.6% → 100% on LongMemEval-S), and v0.7 durability + lineage: ingest crash-recovery journal + `gnokee.resume_ingest` (ADR-0024), `gnokee_history_of` supersession-DAG MCP tool, and hard-removal of the v0.6-deprecated `GNOKEE_INGEST_STRICT` env var. See [`docs/specs/v0.1.md`](docs/specs/v0.1.md) for the core surface and [`docs/README.md`](docs/README.md) for the full doc map.

```text
┌─────────────────────────────────────────────────────────┐
│  Adapters (thin shims; consumer-owned)                  │
│  files / md-vault / api-push / rss / email / pulse-sdk  │
└────────────────┬────────────────────────────────────────┘
                  ↓ ingest_episode (full or lite)
┌─────────────────────────────────────────────────────────┐
│  gnokee core                                            │
│    • bi-temporal store, supersession, contradiction     │
│    • retrieval: pgvector cosine + Cypher narrow         │
│    • forgetting: tombstones + crypto erasure            │
│    • MCP server (5 tools, token-efficient)              │
└────────────────┬────────────────────────────────────────┘
                  ↓
┌─────────────────────────────────────────────────────────┐
│  Backends (pluggable)                                   │
│  Postgres + pgvector  •  Neo4j Community (Graphiti)     │
│  bge-m3 via TEI / Ollama  •  consumer-supplied LLM      │
└─────────────────────────────────────────────────────────┘
```

## Quickstart

Requirements: Docker, Python 3.10+, an OpenAI-compatible LLM API key (gpt-4o-mini per Q4).

```bash
# 1. Bring up Postgres + Neo4j + TEI
make up

# 2. Install (editable) and run the demo
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[mcp,dev]"

# 3. Configure env
cp .env.example .env
# edit .env to set GNOKEE_OPENAI_API_KEY (or GNOKEE_LLM_API_KEY)

# 4. Apply migrations and run the demo
make migrate
make demo
```

The demo ingests five episodes about a person, runs a recall, and prints any contradictions. Output is plain stdout — gnokee is a library + MCP server, not a UI.

### MCP server

```bash
python -m gnokee.mcp        # stdio (default; for Claude Desktop / Code / Cursor)
GNOKEE_MCP_HTTP=1 python -m gnokee.mcp   # streamable-http (dev only)
```

Drop-in client config snippets for Claude Desktop, Claude Code, and Cursor live under [`examples/03_mcp_clients/`](examples/03_mcp_clients/).

Tools surfaced:

| Tool | Purpose |
| --- | --- |
| `gnokee_ingest_episode` | Store a fact / event / observation in bi-temporal memory. |
| `gnokee_recall` | Natural-language fact retrieval with provenance handles. |
| `gnokee_fact_provenance` | Fetch the original episode body behind a `fact_uuid`. |
| `gnokee_lab_query` (v0.2) | Typed clinical-lab reads: `latest \| history \| min \| max \| avg \| count` over `lab_record`. |
| `gnokee_med_query` (v0.2) | Typed medication-history reads: `active \| history \| allergies \| switches` over `med_record`. |
| `gnokee_wiki_export` (v0.5) | Markdown wiki view of a tenant's memory (`index.md` + `log.md` + `entities/<slug>.md`). [ADR-0015](docs/adr/0015-wiki-export-llms-txt.md). |

Core schemas in [`docs/specs/v0.1.md`](docs/specs/v0.1.md) §6; v0.2 typed reads documented in [ADR-0009](docs/adr/0009-clinical-labs-need-structured-store.md) + [ADR-0010](docs/adr/0010-meds-share-lab-record-shape.md); v0.5 wiki export in [ADR-0015](docs/adr/0015-wiki-export-llms-txt.md).

### `llms.txt` manifest

gnokee ships an [`llms.txt`](https://llmstxt.org/) manifest listing the
MCP tool surface, identity model, and red lines so agents can discover
gnokee without reading the full README. Two static files ride along
with every release:

| File | Purpose |
| --- | --- |
| [`static/.well-known/llms.txt`](static/.well-known/llms.txt) | Full structured manifest (tools, ADR index, identity, red lines). |
| [`static/.well-known/gnokee.txt`](static/.well-known/gnokee.txt) | Pointer file (`manifest: ./llms.txt`) for agents probing `<project>.txt`. |

Both files ship inside the PyPI sdist + wheel at
`<site-packages>/gnokee/.well-known/{llms.txt,gnokee.txt}` (see
`pyproject.toml` `[tool.hatch.build.targets.wheel.force-include]`).
gnokee itself does NOT publish a webserver; self-hosters wire their
own static-file route at `/.well-known/llms.txt` if they want the
manifest reachable over HTTP. Docker users find the same files in
the in-repo `static/.well-known/` path of the gnokee install.

### Environment variables

| Var | Purpose |
| --- | --- |
| `GNOKEE_PG_DSN` | `postgresql://…` |
| `GNOKEE_NEO4J_URI` / `GNOKEE_NEO4J_USER` / `GNOKEE_NEO4J_PASSWORD` | Bolt + auth |
| `GNOKEE_TEI_URL` | TEI base URL (e.g. `http://localhost:8080`) |
| `GNOKEE_EMBED_MODEL` | default `bge-m3` (1024-dim, locked) |
| `GNOKEE_OPENAI_BASE_URL` / `GNOKEE_OPENAI_API_KEY` | consumer-supplied LLM (alias: `GNOKEE_LLM_*`) |
| `GNOKEE_LLM_MODEL` | default `gpt-4o-mini` |
| `GNOKEE_TENANT_DEFAULT` | demo + tests only; production paths require `tenant_id` explicitly |
| `GNOKEE_MCP_HTTP` | `0` for stdio, `1` for streamable-http (dev only) |
| `GNOKEE_LOG_LEVEL` | default `info` |

### Tests

```bash
make test-unit             # no compose required
make up && make test-integration   # needs LLM key + compose stack
```


## What gnokee is

A memory layer that treats facts the way infrastructure treats state: declared, versioned, reconciled, garbage-collected. It ingests episodes from any source (files, APIs, event streams), stores them with bi-temporal validity, detects contradictions at write and surfaces them at read, supersedes facts explicitly rather than overwriting, forgets verifiably when asked, and runs maintenance autonomously.

MCP-native. Multi-tenant from day one. Multilingual by default (bge-m3). Built on top of Graphiti's bi-temporal knowledge graph primitive.

Two ingest modes:

- **Full** (`extract=True`, default): full Graphiti narrative extraction + contradiction detection. p50 ≈ 4.5 s, p95 ≈ 7.1 s. For one-off / low-cadence sources.
- **Lite** (`extract=False`, v0.3+): skip the LLM-bound stages; Postgres + TEI only. p50 ≈ 200 ms. For telemetry / wearable summaries / log scrapes. Pair with `gnokee.maintenance.run_deferred_extraction` to backfill narrative facts on a schedule. See [`docs/integration_patterns.md`](docs/integration_patterns.md).

Encrypted-body branch (v0.4+): pass `encrypted_body=...` instead of `content=...`; gnokee stores ciphertext, never decrypts, never sees the DEK. Recall surfaces ciphertext verbatim — consumer decrypts with their own KMS via `key_id`. See [`docs/architecture.md`](docs/architecture.md) §Encrypted-body lifecycle.

## What gnokee is not

- Not a chat UI (use any MCP client)
- Not an LLM host (use Ollama, LiteLLM, direct APIs)
- Not a sync engine, workflow engine, agent framework, document parser, vault editor, auth provider, federation layer

See `docs/architecture.md` for the full refusal list.

## Status

- [x] Namespace claimed (PyPI, npm, GitHub org `gnokeelabs`, domains gnokee.com/.dev/.io)
- [x] Q1 Graphiti spike — ADOPT as storage primitive ([ADR-0001](docs/adr/0001-build-on-graphiti.md))
- [x] Q2 bge-m3 cross-lingual — ADOPT (100% top-1 via direct cosine); retrieval reassigned to gnokee ([ADR-0004](docs/adr/0004-gnokee-owns-retrieval.md))
- [x] Q3 MCP token-efficiency — ADOPT gnokee response shape (60% token reduction at 91.7% top-3 recall vs Graphiti-raw 50%)
- [x] Q4 contradiction-classifier smoke test — ADOPT gpt-4o-mini (8/10 = 80% accuracy on labeled pairs)
- [x] Q5 forgetting hard-delete propagation — ADOPT (2/2 probes; Neo4j cascade + retrieval-surface clean)
- [x] Q5 storage adapter audit (FalkorDB) — Neo4j for v0.1; FalkorDB swap deferred to v0.2 (Graphiti-internal API differs; gnokee's Cypher is portable)
- [x] v0.1 spec finalized ([`docs/specs/v0.1.md`](docs/specs/v0.1.md))
- [x] v0.1 implementation — ingest + recall + MCP + contradictions, integration tests on real compose stack
- [x] Q7 clinical-labs spike — OFF-RAMP at 13.3 %; typed `lab_record` table per [ADR-0009](docs/adr/0009-clinical-labs-need-structured-store.md)
- [x] Q8 med-supersession spike — ADOPT_WITH_GAPS at 53.3 %; typed `med_record` table per [ADR-0010](docs/adr/0010-meds-share-lab-record-shape.md)
- [x] Q9 wearable-throughput off-ramp ([ADR-0011](docs/adr/0011-wearables-batch-or-off-ramp.md))
- [x] v0.2 typed clinical reads landed (`gnokee_lab_query` + `gnokee_med_query`); Q7 lifted to 66.7 % ADOPT_WITH_GAPS, Q8 lifted to 93.3 % ADOPT
- [x] Eval suite formalised across cross-tool (vs Mem0 / Graphiti-alone / Zep-OSS / basic-memory) — Stage A 2026-05-09 (10 Q × 10 sessions, 3 SUTs, double-run judge agreed); Stage C-pilot 2026-05-10 (30 Q × 20-session, 3 SUTs, double-run judge agreed) confirms magnitude — all three SUTs 0/30 strict on LongMemEval-S; gnokee REJECT for v0.2.x retrieval surface ([ADR-0012](docs/adr/0012-cross-tool-eval-verdict.md) Accepted). Q10.1: Zep-OSS + basic-memory adapters wired + smoked (5 wired SUTs). Bottleneck moved from retrieval to synth-prompt abstention bias ([#62](https://github.com/gnokeelabs/gnokee/issues/62)); full-pilot Stage C (100 Q × ~47-session × Track 1 + Track 2) tracked under [#63](https://github.com/gnokeelabs/gnokee/issues/63)
- [ ] First tagged release

## Roadmap

| Phase | Milestone |
|---|---|
| Spike | Q1 — Graphiti's bi-temporal model fits Omur + personal corpora |
| v0.1 | Single-tenant, single-binary; ingest + retrieval + MCP; basic forgetting; Apache 2.0 |
| v0.2 | Typed clinical reads (labs + meds) per ADR-0009/0010; structured Postgres siblings to graphiti's narrative |
| v0.3 | Omur consumes gnokee; multi-tenant validation; encrypted-body branch |
| v1.0 | API stability commitment; published evals vs Mem0 / Graphiti-alone / basic-memory |

## Documentation

[`docs/README.md`](docs/README.md) is the entry map — audience-grouped index of every doc in this repo. Quick links:

- **Evaluating gnokee:** [FAQ](docs/faq.md), [alternatives vs Mem0 / Graphiti / Zep / basic-memory](docs/alternatives.md), [architecture](docs/architecture.md), [eval methodology](docs/evals/methodology.md).
- **Integrating gnokee:** [integration patterns](docs/integration_patterns.md), [consumer recipes](docs/integration/consumer-patterns.md), Python + MCP API reference (under `docs/api/`), runnable [`examples/`](examples/).
- **Operating gnokee:** deployment, observability, privacy posture, [troubleshooting](docs/troubleshooting.md) (all under `docs/`).
- **Contributing:** [`CONTRIBUTING.md`](CONTRIBUTING.md), [`AGENTS.md`](AGENTS.md), [release flow](docs/releasing.md), [`SECURITY.md`](SECURITY.md), [`SUPPORT.md`](SUPPORT.md).

## Project name

`gnokee` = *gno* (Greek γνῶσις, "knowledge") + *kee* (English *keep*, custodian). Pun: "no key" — gnokee never holds keys, never decrypts (see `docs/architecture.md` § Privacy). Originally drafted as `Veda`; rebranded after namespace collision.

## License

Apache-2.0. See [`LICENSE`](LICENSE).

## Contributing

Open with guardrails. Bug fixes, docs, tests, eval improvements, and adapters via PR; new MCP tools, breaking changes, and new ADRs require maintainer pre-approval. The [`AGENTS.md`](AGENTS.md) "do not" list (no UI, no LLM hosting, no auto-resolve, no key holding, no supersession-by-deletion) is normative for every PR.

See:

- [`CONTRIBUTING.md`](CONTRIBUTING.md) — dev setup, coding standards, PR workflow.
- [`docs/releasing.md`](docs/releasing.md) — release flow (maintainer-only).
- [`SECURITY.md`](SECURITY.md) — private vulnerability disclosure.
- [`SUPPORT.md`](SUPPORT.md) — where to ask questions.
- [`CODE_OF_CONDUCT.md`](CODE_OF_CONDUCT.md) — Contributor Covenant 2.1.

## Source of truth

Design documents live in Notion (private) until first commit; from this point forward the repo is canonical. See [`AGENTS.md`](AGENTS.md) for the convention.
