Metadata-Version: 2.4
Name: heavenbase
Version: 0.1.0.5
Summary: HeavenBase: an agent-native polyglot engine for structured data management. Your data, speakable by agents.
Author: Magolor
License-Expression: MIT
Project-URL: Homepage, https://ahvn.top/
Project-URL: Documentation, https://ahvn.top/
Project-URL: Repository, https://github.com/Magolor/HeavenBase
Project-URL: Bug Tracker, https://github.com/Magolor/HeavenBase/issues
Keywords: semantic-layer,entity,vector-db,agents,heavenbase,hvnb
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Database
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pyyaml>=6.0
Requires-Dist: omegaconf>=2.3
Requires-Dist: cloudpickle>=3.0
Requires-Dist: fastmcp>=3.0
Requires-Dist: openai>=1.0
Requires-Dist: anthropic>=0.105
Requires-Dist: sqlalchemy>=2.0
Requires-Dist: typer>=0.12
Requires-Dist: click>=8.1
Requires-Dist: prompt_toolkit>=3.0
Requires-Dist: pandas>=1.5
Provides-Extra: dev
Requires-Dist: black>=24.0; extra == "dev"
Requires-Dist: flake8>=6.0; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: numpy>=1.24; extra == "dev"
Requires-Dist: pillow>=10.0; extra == "dev"
Requires-Dist: pre-commit>=4.0; extra == "dev"
Requires-Dist: virtualenv>=20.0; extra == "dev"
Provides-Extra: interop
Requires-Dist: pydantic>=1.10; extra == "interop"
Requires-Dist: pandas>=2.0; extra == "interop"
Requires-Dist: pyarrow>=12.0; extra == "interop"
Requires-Dist: numpy>=1.24; extra == "interop"
Requires-Dist: sqlalchemy>=2.0; extra == "interop"
Provides-Extra: daft
Requires-Dist: daft>=0.4; extra == "daft"
Requires-Dist: pyarrow>=12.0; extra == "daft"
Provides-Extra: sql
Requires-Dist: sqlalchemy>=2.0; extra == "sql"
Provides-Extra: full
Requires-Dist: sqlalchemy>=2.0; extra == "full"
Requires-Dist: numpy>=1.24; extra == "full"
Requires-Dist: pillow>=10.0; extra == "full"
Requires-Dist: pymongo>=4.0; extra == "full"
Requires-Dist: litellm>=1.0; extra == "full"
Requires-Dist: psycopg2-binary>=2.9; extra == "full"
Requires-Dist: pgvector>=0.3; extra == "full"
Requires-Dist: lancedb>=0.22; extra == "full"
Requires-Dist: pylance>=0.34; extra == "full"
Requires-Dist: elasticsearch>=8.0; extra == "full"
Dynamic: license-file

# HeavenBase

HeavenBase is an agent-native polyglot data engine for structured data management.
You define one logical model, connect the storage systems you already use, and expose one workspace API to applications and agents.

It is not a database replacement. It is the layer that tells agents what data exists, what it means, where it lives, how to query it, and which tool surface they can use safely.

```python
import heavenbase as hb
```

## Release Status

HeavenBase is being prepared for an initial developer release.
The current package has a working workspace facade, entity DSL, query planner, system catalog rows, storage routing, backend handler registry, MCP workspace tools, LLM utilities, prompt storage, config management, interop helpers, and a CLI.

Expect a developer-facing API, useful local defaults, and visible diagnostics.
Do not expect a fully stabilized production control plane yet: provider-native pushdown is still being deepened, some operations intentionally fall back to scan-style execution, and full workspace manifest import/export is still future work.

## Why HeavenBase

Agents fail less often because they cannot reason, and more often because they cannot see the data landscape:

- they do not know a silo exists
- they have stale schema or source maps
- they miss domain vocabulary
- they retrieve only a convenient top-k slice
- they cannot relate data, playbooks, prompts, tools, and memory

HeavenBase addresses that by putting data, metadata, domain knowledge, prompts, tools, and agent memory into one typed workspace. Agents can discover `Catalog` rows for concrete objects, inspect `MetaSchema` rows for structure, and query typed entities through the same API.

## What Ships Today

- `HeavenBase` workspaces for schemas, backends, routing, data, Catalog, MetaSchema, audit, repair, and MCP exposure.
- `Entity` classes with `hb.field(...)`, logical types, defaults, compute hooks, query-compute hooks, JSON schema compilation, and stable `object_id` identity.
- Query surfaces through Python expressions and Mongo-style JSON filters, both lowering into `QuerySpec`.
- Field-level storage placement with strategies such as `InlineColumn`, `JsonField`, `SideTable`, `VectorIndex`, `InvertedIndex`, `GraphEdge`, and `ExternalRef`.
- Built-in backend families for in-memory, file JSON/pickle, SQL, vector stores, and search.
- Handler-based query compilation with `QueryBuilder.explain()` diagnostics for backend, strategy, handler mode, fallback, and unsupported reasons.
- `Catalog` for object discovery and `MetaSchema` for workspace structure.
- Workspace MCP profiles for agent, full/admin, memory, and memstate-style project memory tools.
- `hb.LLM` and `hb llm` for configured chat, embeddings, image generation, sessions, MCP tool loops, caching, and provider/gateway routing.
- `CM_HVNB`, `ConfigEngine`, prompt utilities, SQL helpers, interop helpers, and a small public capability index.

## Install

For the package release:

```bash
pip install heavenbase
hb setup
```

For source development, `uv` is the recommended path:

```bash
git clone https://github.com/Magolor/HeavenBase.git
cd HeavenBase
uv sync --extra dev
uv run hb setup
```

The project pins the default local `uv` interpreter through `.python-version` while keeping package support for Python 3.10 through 3.13.

Optional extras are available when you need more providers:

```bash
pip install "heavenbase[full]"
uv sync --extra full
```

## First Workspace

Use the `debug` preset first. It needs no Docker services and creates local SQLite plus in-memory vector/search backends.

```python
import heavenbase as hb


class Product(hb.Entity):
    name = hb.field(hb.ShortText).desc("Display name")
    body = hb.field(hb.LongText).desc("Searchable description")
    price = hb.field(hb.Float).desc("List price")
    tags = hb.field(hb.Array[hb.ShortText]).default([])


ws = hb.HeavenBase("shop", preset="debug")
ws.register(Product)

desk_id = ws.upsert(
    Product,
    {
        "name": "Oak desk",
        "body": "Writing desk with cable tray",
        "price": 129.0,
        "tags": ["office", "furniture"],
    },
)

frame = (
    ws.query(Product)
    .where(Product.price < 150)
    .where(Product.tags.array_contains("office"))
    .select("name", "price")
    .execute()
)

print(desk_id)
print(frame.rows())
```

Every row has exactly one `object_id`. If you omit it and provide `name`, HeavenBase derives a deterministic ID from the entity schema and row name.

## Query and Explain

Python expressions and JSON filters share the same planner:

```python
cheap = ws.query(Product).where(Product.price < 100).count()

json_frame = ws.query_json(
    Product,
    {
        "filter": {"body": {"$match": "desk"}},
        "select": ["name", "price"],
        "limit": 5,
    },
).execute()

plan = ws.query(Product).where(Product.body.match("desk")).explain()

print(cheap)
print(json_frame.rows())
print(plan["steps"][0]["handler_mode"])
```

Result projection keeps `object_id` so agents can display compact rows and still call `get`, `set`, `delete`, `query`, or `explain` later with stable identity.

## Agent Discovery

HeavenBase writes system rows automatically:

```python
catalog_rows = ws.query(hb.Catalog).select("target_entity", "target_id", "name").execute().rows()
schema_rows = (
    ws.query(hb.MetaSchema)
    .where(hb.MetaSchema.kind == "field")
    .where(hb.MetaSchema.subject_id == Product.schema().entity_id)
    .select("field", "dtype", "desc")
    .execute()
    .rows()
)

print(catalog_rows)
print(schema_rows)
```

Use `Catalog` to find concrete objects. Use `MetaSchema` to learn entities, fields, storage placement, backends, capabilities, and extensions.

## Workspace MCP

Any workspace can become an MCP toolkit:

```python
print(ws.to_mcp_json(name="shop-mcp", profile="agent", transport="http", host="127.0.0.1", port=7001))
ws.serve(name="shop-mcp", profile="agent", transport="http", host="127.0.0.1", port=7001)
```

The `agent` profile exposes a compact schema/data surface: `define_entity`, `list_entities`, `describe_entity`, `upsert`, `get`, `set`, `count`, `query`, and `explain`.
Use `profile="full"` for trusted administrative flows, `profile="memory"` for note-style memory, and `profile="memstate"` for versioned project memory.

## CLI

The `hb` command is a shallow layer over the same APIs:

```bash
hb --help
hb setup
hb ws list
hb ws presets show debug
hb config get heavenbase.workspace.default_preset
hb cfg set heavenbase.llm.default_preset mock
hb llm chat --preset mock --provider mock --gateway mock --no-stream "hello"
hb llm embed "semantic text" --preview
hb prompt create demo.hello --template "Hello, {name}" --tr-key name
hb prompt render demo.hello --args '{"name":"Ada"}'
```

`hb setup` initializes global HeavenBase config and the default workspace. Runtime state lives under the HeavenBase config root, not in a new project-local folder.

## Backends and Presets

Start with presets:

| Preset | Backends | Use case |
| --- | --- | --- |
| `debug` | SQLite `main`, in-memory `vec`, in-memory `search` | Demos, tests, and first-run development with no Docker. |
| `local-lts` | Postgres `main`, LanceDB `vec`, Elasticsearch `search` | Durable local development with the repo service stack. |
| `web-lts` | Postgres-compatible `main`, pgvector `vec`, hosted search | Managed or shared deployments. |

Use explicit backend maps when deployment placement matters:

```python
ws = hb.HeavenBase(
    "custom-shop",
    backends={
        "main": {"type": "sqlite", "database": ":memory:"},
        "vec": {"type": "inmem"},
    },
)
```

Built-in backend families include `inmem`, `json`, `pickle`, `sqlite`, `duckdb`, `postgres`, `pgvector`, `mysql`, `oceanbase`, `mssql`, `oracle`, `trino`, `starrocks`, `lance`, `chroma`, `milvus`, `pinecone`, and `elasticsearch`. Optional providers require their Python drivers and reachable services.

Inspect support from code:

```python
hb.capabilities.backends(hb.Vector, op="near")
ws.capabilities.ops(hb.ShortText, hb.InlineColumn, backend="main")
```

## LLM Utilities

`hb.LLM` resolves presets, model aliases, providers, gateways, request defaults, and cache settings from `CM_HVNB`.

```python
import heavenbase as hb

llm = hb.LLM(preset="mock")
print(llm.chat("Reply with hb-ok"))
```

The default online provider is OpenRouter through an OpenAI-compatible gateway. Configure credentials through environment variables such as `OPENROUTER_API_KEY`, or switch to local/mock providers for tests and demos.

## Environment Policy

Edit `requirements*.txt` first. `pyproject.toml` reads them through setuptools dynamic metadata; `bash scripts/sync-env.bash` refreshes `uv.lock`, `poetry.lock`, and `environment-*.yml`.

Use this install priority order:

1. **uv** — `uv.lock` + `uv sync --all-extras` after `bash scripts/sync-env.bash` (default sync installs runtime and all optional extras).
2. **pip** — `pip install -r requirements.txt` and `pip install -e ".[dev]"` / `pip install -e ".[full]"`.
3. **pyproject** — `pip install -e ".[dev]"` when only package metadata is available.
4. **conda** — generated `environment-*.yml` with `-e ".[<extra>]"`.
5. **poetry** — optional; `poetry install` after `poetry.lock` is refreshed by `bash scripts/sync-env.bash`.

CI should use `bash scripts/sync-env.bash --check` as the generated-file drift gate.

## Development

Use the repo wrappers:

```bash
bash scripts/sync-env.bash
bash scripts/test.bash tests/test_config_spec.py::test_config_engine_forbids_unknowns_and_requires_fields -q
bash scripts/flake.bash -a
```

For the continuous benchmark suite:

```bash
bash scripts/benchmark.bash
```

External database tests are designed to skip or use the Docker stack when services are unavailable. For direct local service setup, use:

```bash
bash ./scripts/docker-restart.bash /d/databases/
```

## Repository Map

| Path | Purpose |
| --- | --- |
| `src/heavenbase/` | Runtime package. |
| `src/heavenbase/workspace/` | Workspace facade, CRUD, query, registry, system rows, presets. |
| `src/heavenbase/entity/` | Entity DSL, field specs, system entities, JSON compiler. |
| `src/heavenbase/query/` | Query expressions, JSON query parsing, query specs. |
| `src/heavenbase/backends/` | Built-in backend implementations and backend registry. |
| `src/heavenbase/handlers/` | Operation handler registry and backend compilers. |
| `src/heavenbase/strategies/` | Storage strategy markers. |
| `src/heavenbase/toolkit/` and `src/heavenbase/mcp/` | Toolkit and MCP surfaces. |
| `src/heavenbase/utils/` | Config, LLM, SQL, serialization, paths, hashing, logging, and runtime helpers. |
| `docs/` | Design notes, capability matrix, plans, and implementation docs. |
| `demos/` | Runnable onboarding and focused backend/provider examples. |
| `tests/` | Core, backend, interop, LLM, MCP, CLI, and thread-safety coverage. |

Start with `demos/00_first_install.py`, then `demos/01_workspace_entity_crud.py`, `demos/02_query_surfaces.py`, and `demos/03_catalog_browse.py`.

## Boundaries

- Capability registration does not guarantee provider-native pushdown. Check `QueryBuilder.explain()` for `handler_mode`, `near_filter_mode`, and fallback reasons.
- Multi-backend writes are coordinated by the workspace but not a distributed transaction system. Keep cross-backend invariants simple.
- File backends are local development tools. Treat pickle stores as trusted-local only.
- Workspace persistence is backend-driven; full workspace manifest import/export is not shipped yet.
- Some docs and demos are implementation references rather than polished onboarding paths. The primary demo path is listed in `demos/README.md`.

## Documentation

- Package docs: `docs/`
- Demos: `demos/README.md`
- Workspace presets: `docs/WORKSPACE_PRESETS.md`
- Configuration: `docs/CONFIG_SPEC.md`
- Identity rules: `docs/ID_SEMANTICS.md`
- MCP: `docs/MCP.md`
- LLM: `docs/LLM.md`
- Capability matrix: `docs/CAPABILITIES.md`
