Metadata-Version: 2.4
Name: kioku-ai
Version: 0.2.4
Summary: Local-first persistent memory layer for AI agents. MCP server + REST API.
Project-URL: Homepage, https://kiokuapp.cloud
Project-URL: Repository, https://github.com/kiokuai/kioku
Project-URL: Documentation, https://docs.kiokuapp.cloud
Project-URL: Bug Tracker, https://github.com/kiokuai/kioku/issues
Project-URL: Changelog, https://github.com/kiokuai/kioku/blob/main/CHANGELOG.md
Author: Kioku
License: MIT
License-File: LICENSE
Keywords: agents,ai,chatgpt,claude,cursor,local-first,mcp,memory,openclaw,rag
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: aiosqlite>=0.19.0
Requires-Dist: anthropic>=0.39
Requires-Dist: apscheduler>=3.10
Requires-Dist: argon2-cffi>=23.1
Requires-Dist: bcrypt>=4.0
Requires-Dist: chromadb>=0.5.0
Requires-Dist: cryptography>=46.0.7
Requires-Dist: defusedxml>=0.7.1
Requires-Dist: fastapi>=0.115
Requires-Dist: google-genai>=1.0
Requires-Dist: graspologic>=3.0
Requires-Dist: httpx>=0.27
Requires-Dist: idna>=3.15
Requires-Dist: ijson>=3.2
Requires-Dist: lxml>=6.1.0
Requires-Dist: mcp[cli]>=1.2.0
Requires-Dist: openai>=1.0
Requires-Dist: openpyxl>=3.1
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: pyjwt>=2.0
Requires-Dist: pypdf>=6.10.2
Requires-Dist: python-dateutil>=2.9
Requires-Dist: python-multipart>=0.0.27
Requires-Dist: scikit-learn>=1.3
Requires-Dist: sentence-transformers>=3.0.0
Requires-Dist: slowapi>=0.1.9
Requires-Dist: starlette>=1.0.1
Requires-Dist: ujson>=5.12.1
Requires-Dist: urllib3>=2.7.0
Requires-Dist: uvicorn[standard]>=0.34
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.39; extra == 'anthropic'
Provides-Extra: cards
Requires-Dist: pillow>=10.0; extra == 'cards'
Provides-Extra: code
Requires-Dist: tree-sitter-language-pack<1.0,>=0.7; extra == 'code'
Requires-Dist: tree-sitter>=0.23; extra == 'code'
Provides-Extra: dev
Requires-Dist: httpx>=0.27; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest-timeout>=2.3; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff<0.16,>=0.15; extra == 'dev'
Requires-Dist: tree-sitter-language-pack<1.0,>=0.7; extra == 'dev'
Requires-Dist: tree-sitter>=0.23; extra == 'dev'
Provides-Extra: encryption
Requires-Dist: argon2-cffi>=23.1; extra == 'encryption'
Requires-Dist: cryptography>=46.0.7; extra == 'encryption'
Provides-Extra: gemini
Requires-Dist: google-genai>=1.0; extra == 'gemini'
Provides-Extra: langchain
Requires-Dist: langchain-core>=1.0; extra == 'langchain'
Provides-Extra: multimodal
Requires-Dist: docling-parse<6.0,>=5.9; extra == 'multimodal'
Requires-Dist: pillow>=10.0; extra == 'multimodal'
Requires-Dist: pymupdf>=1.24.0; extra == 'multimodal'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Provides-Extra: vector
Requires-Dist: chromadb>=0.5.0; extra == 'vector'
Requires-Dist: sentence-transformers>=3.0.0; extra == 'vector'
Description-Content-Type: text/markdown

<div align="center">
  <img src="https://kiokuapp.cloud/kioku-logo.png" alt="Kioku" width="120" height="120">
  <h1>Kioku</h1>
  <p><strong>Cross-model memory, local-first.</strong></p>
  <p>
    <a href="https://pypi.org/project/kioku-ai/"><img src="https://img.shields.io/pypi/v/kioku-ai?color=00CCCC&labelColor=111411" alt="PyPI"></a>
    <a href="https://opensource.org/licenses/MIT"><img src="https://img.shields.io/badge/License-MIT-52C878?labelColor=111411" alt="MIT License"></a>
    <img src="https://img.shields.io/badge/Python-3.11+-00FFFF?labelColor=111411" alt="Python 3.11+">
  </p>
</div>

Kioku is cross-model memory, local-first. One memory layer shared across Claude, ChatGPT, Cursor, Gemini, Copilot, and any MCP-aware agent, so you stop re-explaining yourself and stop losing useful work. It runs on your own OpenAI, Anthropic, Gemini, or Perplexity key — or a fully local model via Ollama, with no key and no data leaving your machine — and keeps your data on your machine by default. Local. Cross-model. Yours.

Free needs no account. Pro (€9/mo, free during early access) adds the knowledge graph, custom and scheduled agents, and end-to-end-encrypted multi-device sync.

## What Kioku is for

Kioku is strongest when you need:

- exact replay, not just vague personalization
- one source-backed context layer across more than one AI tool
- transcript and provenance inspection before trusting a result
- local-first ownership instead of vendor lock-in

Best-fit users:

- developers
- technical founders
- researchers
- legal and finance professionals
- anyone doing high-trust knowledge work across multiple AI tools

## What you can do with Kioku

- create reusable Context Packs from verified sources — decision-first, cited, and provenance-backed
- get reversed/superseded decisions flagged in a Pack (it leads with the current position and marks what changed) instead of silently mixing old and new
- import old ChatGPT, Claude, Gemini, and Perplexity history
- run Recall against your own Archive
- inspect transcripts, provenance, and preserved artifacts
- connect Claude Desktop, Cursor, and other MCP clients
- save supported browser chats with the extension
- use Kioku from Python through the SDK that ships inside `kioku-ai`
- add LangChain memory on top of the same backend
- bring your own LLM key (OpenAI, Anthropic, Gemini, Perplexity) or run fully local with Ollama — pick the active provider when you have more than one

For developers and coding agents:

- index a local folder or GitHub repo as code memory (`kioku index-repo`)
- sync external connectors such as GitHub issues/PRs (`kioku connector sync`)
- pull task-aware context and recall scoped to what you're working on
- record a session handoff and resume it later from any MCP client

Pro capabilities (free tier is fully usable without them):

- knowledge graph / Relationship Map across your memories
- a synthesized Profile/persona built from your own data
- end-to-end-encrypted cross-device sync
- digests, proactive nudges, and custom + scheduled agents

## How it works

Kioku front-loads the hard work at write time so reads are fast and source-backed:

1. **Ingest** — conversations, documents (PDF / DOCX / XLSX / images), code, and browser captures stream in through one provider-agnostic pipeline. The raw export is preserved in an Archive; re-imports are idempotent (content-hashed), so the same history never duplicates.
2. **Extract typed units** — each item is broken into small, classified memory units (facts, decisions, preferences, goals, relationships…) and enriched. `memory_type` is an open, LLM-assigned label, not a fixed enum.
3. **Retrieve with multiple signals** — a query is answered by fusing semantic (vector), lexical (BM25/FTS), conversation, and graph signals, then reranking with a cross-encoder — not a single embedding lookup.
4. **Synthesize a Context Pack** — a tight, decision-first brief (~300–500 tokens) built only from retrieved sources, with citations and provenance. It distinguishes settled decisions from open deliberation, and flags decisions that were later reversed.
5. **Inject** — the Pack (or raw recall) is delivered to your AI tool over MCP, the REST API, or the SDK.

Background **specialist agents** run on a schedule to keep memory healthy: enrichment/classification, duplicate review, **reversal detection** (a decision changed over time), the knowledge graph build, a synthesized persona, goal/temporal/learning tracking, and retention cleanup.

**Surfaces:** a local FastAPI backend (`127.0.0.1:8742`) + an MCP server (stdio/SSE), a Tauri desktop app, a Chrome extension, a Python SDK (+ LangChain), and an optional Cloudflare worker for accounts, billing, licensing, and encrypted sync. Everything is scoped per user/workspace; content is encryptable at rest and secrets are redacted on ingest.

## Choose your starting path

### 1. Desktop user

If you want to use the app first:

```bash
pip install kioku-ai
kioku warmup
kioku serve-http
```

Then:

1. Open the desktop app
2. Import old history first
3. Open Archive to inspect transcripts and provenance
4. Use Recall when you need the same code, fix, or decision again

### 2. MCP user

If you mainly work in Claude Desktop or Cursor:

```bash
pip install kioku-ai
kioku serve-http
```

Then:

1. Connect Kioku from the desktop app or write the MCP config manually
2. Restart Claude Desktop, Cursor, or your MCP client
3. Ask for:
   - the same code again
   - the prior fix
   - the earlier decision with transcript

### 3. Python / agent user

If you want to use Kioku from code:

```bash
pip install kioku-ai
kioku serve-http
```

Then use the Python SDK:

```python
from kioku_client import KiokuMemory

memory = KiokuMemory()
memory.add("User prefers Python over JavaScript")
results = memory.search("programming preferences")
print(results[0]["content"])  # search() returns flat memory dicts
```

## Installation

### End users

Install Kioku:

```bash
pip install kioku-ai
kioku warmup
kioku verify
```

Start the local backend:

```bash
kioku serve-http
```

The backend normally runs on `127.0.0.1:8742`.

### Optional multimodal extras

If you want richer local document import:

```bash
pip install "kioku-ai[multimodal]"
```

This enables:

- structured PDF parsing through Docling Parse
- cloud OCR support for screenshots and scanned documents
- richer PDF handling without Java or Tesseract system dependencies

### Optional code-ingestion extras

To index local folders and GitHub repos as code memory:

```bash
pip install "kioku-ai[code]"
```

This pulls in the tree-sitter parsers used to chunk source code by symbol.

### Optional encryption-at-rest extras

To encrypt memory content at rest (AES-256-GCM, passphrase-derived key):

```bash
pip install "kioku-ai[encryption]"
```

Set `KIOKU_ENCRYPTION_ENABLED=true` and `KIOKU_ENCRYPTION_PASSPHRASE=…`. Keyword
search still works on encrypted installs via a keyed blind-token index — your
plaintext terms are never written to disk.

### Configure your LLM provider (BYOK)

LLM features (extraction, synthesis, the chatbot, the knowledge graph, and the
agents) run on a key you supply — Kioku never ships a shared key. Any **one** of
these turns on the full feature set:

- **OpenAI**, **Anthropic** (Claude), **Gemini**, **Perplexity** — cloud, key-based
- **Ollama** — fully local, **no key**; point it at your local server (default `http://localhost:11434`)

Keys come from either:

- your environment / `.env` (e.g. `OPENAI_API_KEY=…`) — persists across restarts, or
- the desktop **Settings → API keys** screen — applied instantly without a restart

If both are set for the same provider, the Settings key wins. When you have more
than one provider configured, you choose the **active** one (Settings, or
`POST /api/v1/llm/active`); only one is active at a time. On Claude and Gemini the
chatbot streams token-by-token just like OpenAI.

### Developers running from this repo

```bash
git clone https://github.com/kiokuai/kioku
cd kioku

python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
kioku warmup
```

Run tests:

```bash
pytest tests/
```

Run the desktop app in development mode:

```bash
cd desktop
npm install
npm run tauri dev
```

## Developer usage

### Python SDK

The Python SDK ships inside `kioku-ai`.

Install:

```bash
pip install kioku-ai
```

Docs:

- [kioku_client/README.md](kioku_client/README.md)

Common operations:

```python
from kioku_client import KiokuMemory

memory = KiokuMemory()

memory.add("Likes dark mode", memory_type="preference", tags=["ui"])
memory.search("theme preference", limit=5)
memory.get_context("coding style")
memory.ask("What editor do I use?")
memory.list(memory_type="preference", limit=20)
memory.health()
memory.stats()
```

### LangChain

Install the LangChain extra:

```bash
pip install "kioku-ai[langchain]" langchain-openai
```

Example:

```python
from kioku_client.langchain import KiokuSessionStore
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "{input}"),
])

chain = prompt | llm
sessions = KiokuSessionStore()

chain_with_memory = RunnableWithMessageHistory(
    chain,
    sessions.get_history,
    input_messages_key="input",
    history_messages_key="history",
)

response = chain_with_memory.invoke(
    {"input": "I love building with FastAPI"},
    config={"configurable": {"session_id": "my-session"}},
)
```

### MCP

Kioku can expose a local MCP server for tools like Claude Desktop and Cursor.

Typical flow:

1. `pip install kioku-ai`
2. `kioku serve-http`
3. Connect Kioku through the desktop app or generated MCP config
4. Restart the MCP client

Tools exposed to MCP clients include:

- `remember`, `recall`, `list_memories`, `forget` — core memory
- `get_context`, `get_context_pack`, `attach_context_pack` — source-backed context
- `add_conversation`, `get_persona`, `get_stats` — conversation + profile + stats
- `remember_code`, `developer_recall`, `get_task_context` — code and task-aware context
- `record_handoff`, `resume_session` — session handoff and resume

### Extension and imports

Kioku supports:

- ChatGPT, Claude, Gemini, and Perplexity import
- browser capture for ChatGPT, Claude, Gemini, Copilot, and Perplexity

Best rollout order:

1. imports first
2. MCP next
3. browser capture later

## Benchmarks

Kioku currently has two benchmark tracks:

- retrieval benchmark:
  - [benchmarks/search_quality.py](benchmarks/search_quality.py)
- LongMemEval-style benchmark:
  - [benchmarks/longmemeval/run_benchmark.py](benchmarks/longmemeval/run_benchmark.py)

Supporting benchmark assets:

- [benchmarks/README.md](benchmarks/README.md)
- [benchmarks/longmemeval/longmemeval_balanced_60.json](benchmarks/longmemeval/longmemeval_balanced_60.json)
- [benchmarks/longmemeval/longmemeval_oracle.json](benchmarks/longmemeval/longmemeval_oracle.json)

Run them with:

```bash
# deterministic (fast, no model/keys) — regression guard
PYTHONPATH=src .venv/bin/python benchmarks/run_suite.py --suite search
# representative — uses the real local embedding model (no API key)
PYTHONPATH=src .venv/bin/python benchmarks/run_suite.py --suite search --real-embeddings
# LongMemEval needs working model access (see caveat below)
PYTHONPATH=src .venv/bin/python benchmarks/run_suite.py --suite longmemeval --longmemeval-mode full --limit 25
```

### Current retrieval benchmark

Latest run on the current tree (2026-06-14), 23-case search-quality suite.
Numbers are reported with the **real local embedding model**
(`--real-embeddings`) since that reflects what users actually get; the suite
has some run-to-run variance, so a range across repeated runs is given rather
than a single hero figure.

Real embeddings (3 runs):

- pass rate: **87–91%** (`20–21/23`)
- MRR: **0.84–0.87**
- average latency: **~210ms**
- P95 latency: **~160ms**

Deterministic-embedding mode (`run_suite.py --suite search`, the default) is a
stable lower bound at **87% / MRR ~0.79**, but uses hash-based toy embeddings
and shows a one-off ~3s P95 from the first-query model load — it is a
regression guard, not a representative score.

Each run writes timestamped JSON to `benchmarks/results/` (git-ignored), e.g.
`benchmark_suite_<ts>.json` and `search_quality_{real,det}_<ts>.json`.

> A previous README figure of 95.7% came from an April 2026 snapshot that
> predates this repository's git history and could not be reproduced on the
> current dependency set; the pre/post comparison above confirms no recent code
> regression (the same suite scores identically before and after the latest
> search changes).

### Current LongMemEval sample result

The LongMemEval harness now drives the **real product pipeline** —
`add_conversation(extract=True)` for ingestion and `manager.synthesize()` for
answering — so the score reflects Kioku's actual extraction + synthesis, not a
bespoke benchmark reimplementation. (The old bespoke prompts remain available via
`--legacy-harness` for comparison.)

Latest run (2026-06-15), full mode, 25-question balanced sample, `gpt-4o-mini`:

- overall accuracy: **60%** (`15/25`), no crashes
- by category: knowledge-update **70%** (`7/10`), multi-session **50%** (`5/10`),
  single-session-assistant **60%** (`3/5`)
- elapsed: **~900s**

What the fidelity work changed (each was a real product fix, verified on
category subsets, that also helps everyday `/ask` answers):

- **Recall** — "remind me about our earlier chat about X" recall roughly doubled
  vs the old bespoke harness; the root cause was extraction dropping detail, now
  preserved (adaptive caps + atomic/detail extraction prompt).
- **Counting / sums** — quantity questions ("how many / how much / how long
  total") now compute and state the explicit total (e.g. "3.5 weeks", "$185")
  instead of just listing the parts. Summation subset went ~1/7 → **6/7**.
- **Temporal updates** — when a fact changes over time, synthesis now sees each
  memory's date and returns the most-recent value (e.g. an improved 5K time, an
  updated mortgage pre-approval) instead of an older one.
- **Distinct-fact preservation** — a dedup fix stops the memory layer merging
  separately-countable facts (a blue bike and a red bike are two memories), which
  previously collapsed "how many X" answers.

Caveats (read honestly):

- The 25-question sample is **small and noisy** — per-category figures
  (especially the 5-question single-session bucket) swing run-to-run with
  `gpt-4o-mini` non-determinism. Treat this as indicative, not a headline claim.
- A rare, **intermittent** `'int' object is not subscriptable` crash has been
  observed in long (~20+ question) single-process runs; it is caught per-question
  (the run continues) and did not occur in this run. Set
  `KIOKU_BENCH_TRACEBACK=1` to capture it; the robust fix (per-question subprocess
  isolation) is tracked as follow-up.

Result files are written to `benchmarks/results/` and
`benchmarks/longmemeval/results/` (both git-ignored).

## Repo structure

- [src/kioku](src/kioku) — Python backend (memory, search, graph, MCP, importers, connectors)
- [desktop](desktop) — Tauri desktop app
- [extension](extension) — browser extension
- [kioku_client](kioku_client) — Python SDK and LangChain integration
- [website](website) — marketing site
- [infra/workers](infra/workers) — Cloudflare worker for cloud sync, billing, and licensing
- [benchmarks](benchmarks) — benchmark runners and result artifacts
- [verify](verify) — live-verification harness for external key/runtime checks (`npm run verify:live`)

## License

MIT
