Metadata-Version: 2.4
Name: khora
Version: 0.11.2
Summary: Knowledge memory library for long-horizon AI agents — hybrid retrieval over documents, embeddings, and graph relationships
Project-URL: Homepage, https://github.com/DeytaHQ/khora
Project-URL: Documentation, https://github.com/DeytaHQ/khora/blob/main/docs/README.md
Project-URL: Repository, https://github.com/DeytaHQ/khora
Project-URL: Issues, https://github.com/DeytaHQ/khora/issues
Project-URL: Changelog, https://github.com/DeytaHQ/khora/blob/main/CHANGELOG.md
Project-URL: Releases, https://github.com/DeytaHQ/khora/releases
Author: AllTheData Inc.
License-Expression: Apache-2.0
License-File: LICENSE
License-File: NOTICE
Requires-Python: >=3.13
Requires-Dist: alembic>=1.18.4
Requires-Dist: asyncpg>=0.31.0
Requires-Dist: dateparser>=1.2.0
Requires-Dist: jinja2>=3.1.6
Requires-Dist: litellm>=1.81.15
Requires-Dist: loguru>=0.7.3
Requires-Dist: neo4j>=6.1.0
Requires-Dist: opentelemetry-api>=1.27.0
Requires-Dist: pgvector>=0.4.2
Requires-Dist: pydantic-settings>=2.13.1
Requires-Dist: pyyaml>=6.0.3
Requires-Dist: sentence-transformers>=5.2.3
Requires-Dist: sqlalchemy[asyncio]>=2.0.47
Requires-Dist: tenacity>=9.1.4
Requires-Dist: tiktoken>=0.12.0
Provides-Extra: accel
Requires-Dist: rapidfuzz>=3.0.0; extra == 'accel'
Provides-Extra: age
Requires-Dist: asyncpg>=0.31.0; extra == 'age'
Provides-Extra: all-backends
Requires-Dist: aiosqlite>=0.20.0; extra == 'all-backends'
Requires-Dist: asyncpg>=0.31.0; extra == 'all-backends'
Requires-Dist: httpx>=0.28.1; extra == 'all-backends'
Requires-Dist: kuzu>=0.11.3; extra == 'all-backends'
Requires-Dist: lancedb>=0.25; extra == 'all-backends'
Requires-Dist: neo4j>=6.1.0; extra == 'all-backends'
Requires-Dist: pgvector>=0.4.2; extra == 'all-backends'
Requires-Dist: pyarrow>=18.0.0; extra == 'all-backends'
Requires-Dist: surrealdb>=2.0.0a1; extra == 'all-backends'
Requires-Dist: weaviate-client>=4.20.1; extra == 'all-backends'
Provides-Extra: binary-readers
Requires-Dist: openpyxl>=3.1.0; extra == 'binary-readers'
Requires-Dist: pymupdf>=1.25.0; extra == 'binary-readers'
Requires-Dist: python-docx>=1.1.0; extra == 'binary-readers'
Provides-Extra: dev
Requires-Dist: coverage>=7.13.4; extra == 'dev'
Requires-Dist: hypothesis>=6.140.0; extra == 'dev'
Requires-Dist: prek>=0.3.3; extra == 'dev'
Requires-Dist: pytest-asyncio>=1.3.0; extra == 'dev'
Requires-Dist: pytest-cov>=7.0.0; extra == 'dev'
Requires-Dist: pytest-mock>=3.15.1; extra == 'dev'
Requires-Dist: pytest-xdist>=3.6; extra == 'dev'
Requires-Dist: pytest>=9.0.3; extra == 'dev'
Requires-Dist: ruff>=0.15.2; extra == 'dev'
Requires-Dist: ty>=0.0.18; extra == 'dev'
Provides-Extra: embedded
Requires-Dist: surrealdb>=2.0.0a1; extra == 'embedded'
Provides-Extra: graph-all
Requires-Dist: kuzu>=0.11.3; extra == 'graph-all'
Requires-Dist: neo4j>=6.1.0; extra == 'graph-all'
Provides-Extra: kuzu
Requires-Dist: kuzu>=0.11.3; extra == 'kuzu'
Provides-Extra: lancedb
Requires-Dist: lancedb>=0.25; extra == 'lancedb'
Requires-Dist: pyarrow>=18.0.0; extra == 'lancedb'
Provides-Extra: logfire
Requires-Dist: logfire>=4.0; extra == 'logfire'
Provides-Extra: memgraph
Requires-Dist: neo4j>=6.1.0; extra == 'memgraph'
Provides-Extra: neo4j
Requires-Dist: neo4j>=6.1.0; extra == 'neo4j'
Provides-Extra: neptune
Requires-Dist: neo4j>=6.1.0; extra == 'neptune'
Provides-Extra: neptune-iam
Requires-Dist: boto3>=1.35.0; extra == 'neptune-iam'
Requires-Dist: neo4j>=6.1.0; extra == 'neptune-iam'
Provides-Extra: nlp
Requires-Dist: spacy>=3.8.0; extra == 'nlp'
Provides-Extra: otel
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.27.0; extra == 'otel'
Requires-Dist: opentelemetry-sdk>=1.27.0; extra == 'otel'
Provides-Extra: otel-grpc
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc>=1.27.0; extra == 'otel-grpc'
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.27.0; extra == 'otel-grpc'
Requires-Dist: opentelemetry-sdk>=1.27.0; extra == 'otel-grpc'
Provides-Extra: parquet
Requires-Dist: pyarrow>=18.0.0; extra == 'parquet'
Provides-Extra: postgres
Requires-Dist: asyncpg>=0.31.0; extra == 'postgres'
Requires-Dist: pgvector>=0.4.2; extra == 'postgres'
Provides-Extra: reranking
Requires-Dist: sentence-transformers>=3.0.0; extra == 'reranking'
Provides-Extra: rust
Requires-Dist: khora-accel==0.11.1; extra == 'rust'
Provides-Extra: sqlite
Requires-Dist: aiosqlite>=0.20.0; extra == 'sqlite'
Provides-Extra: sqlite-lance
Requires-Dist: aiosqlite>=0.20.0; extra == 'sqlite-lance'
Requires-Dist: lancedb>=0.25; extra == 'sqlite-lance'
Requires-Dist: pyarrow>=18.0.0; extra == 'sqlite-lance'
Provides-Extra: surrealdb
Requires-Dist: surrealdb>=2.0.0a1; extra == 'surrealdb'
Provides-Extra: weaviate
Requires-Dist: weaviate-client>=4.20.1; extra == 'weaviate'
Description-Content-Type: text/markdown

# Khora

[![CI](https://github.com/DeytaHQ/khora/actions/workflows/ci.yml/badge.svg)](https://github.com/DeytaHQ/khora/actions/workflows/ci.yml)
[![Release](https://github.com/DeytaHQ/khora/actions/workflows/release.yml/badge.svg)](https://github.com/DeytaHQ/khora/actions/workflows/release.yml)
[![codecov](https://codecov.io/gh/DeytaHQ/khora/branch/main/graph/badge.svg)](https://codecov.io/gh/DeytaHQ/khora)
[![Python 3.13+](https://img.shields.io/badge/python-3.13%2B-blue.svg)](https://www.python.org/downloads/)
[![License: Apache 2.0](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://github.com/DeytaHQ/khora/blob/main/LICENSE)

> *"Khora is the receptacle, the space, the matrix in which all things come to be."* — Plato, *Timaeus*

Khora is a knowledge memory library for long-horizon AI agents, with pluggable retrieval engines and storage backends to fit different workloads. It stores knowledge as documents, embeddings, and graph relationships, and retrieves it through hybrid search (vector + graph + keyword), reranking, and temporal context.

Khora is a **library, not an application**. Tooling lives in sibling packages (coming soon...):

- khora-cli (to be released soon) — CLI tooling for extraction and search.
- khora-explorer (to be released soon) — tooling for ontology construction and exploration.

## Install

```bash
pip install khora                 # core (PostgreSQL + pgvector)
pip install khora[sqlite-lance]   # [experimental] embedded SQLite + LanceDB
pip install khora[surrealdb]      # [experimental] unified SurrealDB (single store)
pip install khora[all-backends]   # everything: Neo4j, SurrealDB, SQLite+LanceDB, Weaviate, AGE
```

See [docs/configuration.md](https://github.com/DeytaHQ/khora/blob/main/docs/configuration.md) for the full extras list.

## Production stack

The production stack is **PostgreSQL + pgvector + Neo4j**:

- **VectorCypher** (default engine) — runs on PostgreSQL + pgvector + Neo4j.
- **Chronicle** — runs on PostgreSQL + pgvector (no graph DB required).
- **Skeleton** — available; PostgreSQL + pgvector (no graph DB required).

Set `KHORA_DATABASE_URL` and `KHORA_NEO4J_URL`, run `uv run alembic upgrade head`, then instantiate `Khora()` with no arguments:

```python
import asyncio
from khora import Khora

async def main() -> None:
    async with Khora() as kb:  # reads KHORA_DATABASE_URL / KHORA_NEO4J_URL
        ns = await kb.create_namespace("demo")
        await kb.remember(
            "Marie Curie won the Nobel Prize in Physics in 1903.",
            namespace=ns.namespace_id,
        )
        result = await kb.recall("What did Curie win?", namespace=ns.namespace_id)
        print(result.context_text)

asyncio.run(main())
```

## Batch processing

`submit_batch()` stages documents as PENDING and returns a `BatchHandle` immediately. A background processor picks them up and calls `on_result` per document as each completes.

**The processor is opt-in.** Call `kb.start_pending_processor()` after `connect()` on services that write documents. Read-only services do not need it. The processor can be stopped with `await kb.stop_pending_processor()` and restarted at any time.

```python
async with Khora() as kb:
    kb.start_pending_processor()   # opt-in; write-path services only
    handle = await kb.submit_batch(
        [{"content": "doc 1"}, {"content": "doc 2"}],
        on_result=lambda completed, total, result: print(result),
        namespace=ns_id,
    )
    await handle.wait()
```

## Embedded options (experimental)

Khora ships two zero-infrastructure paths. Both are marked **experimental** — fine for demos, evaluation, tests, and small single-user CLIs; not yet stamped as a deployment story.

- **SQLite + LanceDB** (`pip install khora[sqlite-lance]`, set `KHORA_STORAGE_BACKEND=sqlite_lance`) — recommended embedded stack. Covers VectorCypher, Skeleton, and Chronicle via dialect-aware Alembic migrations and LanceDB-backed vector search. Documented scale ceiling: **~1M chunks, ~100k entities, ~500k edges, traversal depth ≤3**. Known gaps: no point-in-time queries, partial atomicity in `coordinator.transaction()`, FTS on chunks only. See [configuration.md](https://github.com/DeytaHQ/khora/blob/main/docs/configuration.md#embedded-backends-experimental).
- **SurrealDB** (`pip install khora[surrealdb]`) — unified relational + vector + graph in one store. Python SDK is on the alpha track (`>=2.0.0a1`), and KNN (`<|K|>`) is unreliable in embedded mode (uses brute-force cosine + HNSW fallback). Suitable for experimentation; not recommended for production.

> **Quickstart caveat.** A literal `Khora("memory://")` call passes `"memory://"` as the PostgreSQL URL, not as a backend selector — there is no `memory://` URL scheme parsed by khora itself today. To use the embedded path, set `KHORA_STORAGE_BACKEND=sqlite_lance` (or `surrealdb`) and the corresponding `db_path` / connection settings. Routing a true `memory://` URI to the SQLite+LanceDB stack is tracked for v0.10.

## Observability

khora emits OpenTelemetry spans and metrics through the OTel API.
The export path is your choice: vanilla OTel SDK (`pip install
khora[otel]`), [Logfire](https://logfire.pydantic.dev/)
(`pip install khora[logfire]`), or nothing (zero-cost no-op). Khora
never installs a `TracerProvider` at import time and never sets
`service.name` — those belong to the host application.

```bash
pip install khora[otel]
export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318"
export OTEL_SERVICE_NAME="my-app"
```

```python
from khora.telemetry import configure_telemetry
configure_telemetry()      # honors OTEL_* env vars
```

See [docs/observability.md](https://github.com/DeytaHQ/khora/blob/main/docs/observability.md) for the full env-var
contract, the precedence rules, vendor recipes (Honeycomb, Datadog,
Tempo, etc.), sampling guidance, and the troubleshooting checklist.
The complete telemetry surface lives in
[`docs/telemetry-contract.json`](https://github.com/DeytaHQ/khora/blob/main/docs/telemetry-contract.json) with the
drift gate enforced by `tests/unit/telemetry/test_contract.py`.

Two separate observability channels live in `khora.telemetry`:

- **Spans + metrics** via the OTel API (this section).
- **Structured `LLMEvent` / `StorageEvent` / `PipelineEvent` rows** to
  a dedicated PostgreSQL database when `KHORA_TELEMETRY_DATABASE_URL`
  is set. Without it, a `NoOpCollector` is used (zero cost). Wired by
  `init_telemetry()`, independent of `configure_telemetry()`.

Credential fields on `KhoraConfig` (DSNs, passwords) are
`pydantic.SecretStr` — `repr()` and config dumps render as
`'**********'`. Callers that need the cleartext must call
`.get_secret_value()` explicitly.

**Async logging caveat.** Library consumers that import khora without
configuring loguru sinks inherit the default sync stderr sink, which
blocks the event loop on every log call inside `async def`. Either
call `khora.logging_config.setup_logging()` (which configures sinks
with `enqueue=True` and registers an `atexit` drain) or configure
your own loguru sinks with `enqueue=True` explicitly.

## Documentation

Start at [docs/README.md](https://github.com/DeytaHQ/khora/blob/main/docs/README.md). Key entry points:

- [API reference](https://github.com/DeytaHQ/khora/blob/main/docs/api-reference.md) — public `Khora` surface.
- [Configuration](https://github.com/DeytaHQ/khora/blob/main/docs/configuration.md) — `KHORA_*` env vars and `KhoraConfig`.
- [Observability](https://github.com/DeytaHQ/khora/blob/main/docs/observability.md) — OTel spans/metrics, `[otel]`/`[logfire]` paths, `configure_telemetry()`.
- [Architecture](https://github.com/DeytaHQ/khora/blob/main/docs/architecture/overview.md) — how the pieces fit.
- [Engines](https://github.com/DeytaHQ/khora/blob/main/docs/engines/engine-comparison.md) — VectorCypher, Skeleton, Chronicle.
- [Migrations](https://github.com/DeytaHQ/khora/blob/main/docs/migrations.md) — Alembic workflow for library users.
- [Downstream consumers](https://github.com/DeytaHQ/khora/blob/main/docs/consumers.md) — sibling packages and integration guide.

## Development

```bash
make dev         # start PostgreSQL + Neo4j (Docker)
make test        # pytest with coverage
make format      # ruff format + isort
make lint        # ruff + ty typecheck
```

See [CHANGELOG.md](https://github.com/DeytaHQ/khora/blob/main/CHANGELOG.md) for release history.

## License

Copyright 2026 AllTheData Inc.

Licensed under the Apache License, Version 2.0. See [LICENSE](https://github.com/DeytaHQ/khora/blob/main/LICENSE) and [NOTICE](https://github.com/DeytaHQ/khora/blob/main/NOTICE).
