Metadata-Version: 2.4
Name: obj-db
Version: 1.1.2
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Rust
Classifier: Topic :: Database :: Database Engines/Servers
Summary: Python binding for the obj embedded document database.
Author: obj contributors
License: MIT OR Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/uname-n/obj
Project-URL: Repository, https://github.com/uname-n/obj

# obj-db (Python)

> The embedded document database for Python. Dependable. Portable. Zero-infrastructure.

[![PyPI](https://img.shields.io/pypi/v/obj-db.svg)](https://pypi.org/project/obj-db/)

Part of [`obj`](https://github.com/uname-n/obj) — a self-contained,
serverless, single-file document database with a stable file format and
full ACID semantics. The wheel is at parity with the Rust surface and
writes a byte-identical file format.

Wheel `obj-db` on PyPI; import as `obj`. Built with PyO3 (`abi3-py39`).

```bash
pip install obj-db
```

---

## Quickstart

Wrap a `@dataclass` with `@obj.document` for the typed, ergonomic API.
The codec produces postcard bytes byte-identical to Rust's
`#[derive(Document)]` for the same schema.

```python
from dataclasses import dataclass
import obj

@obj.document(collection="orders", version=1)
@dataclass
class Order:
    customer_id: int
    total: float
    status: str

with obj.Db("app.obj") as db:
    doc_id = db.insert(Order(customer_id=1, total=99.5, status="pending"))
    order = db.get(Order, doc_id)
    for (oid, o) in db.all(Order):
        ...
```

---

## Three write surfaces

The same CRUD methods (`insert` / `get` / `update` / `upsert` / `delete`
/ `all`) dispatch by argument type:

- **Typed documents** — pass an `@obj.document` instance or class.
  Routes through the schema-driven codec; on-disk bytes match Rust.
- **Dict-native** — pass a collection `str` plus a `dict` for ad-hoc
  writes with no `@document` boilerplate: `db.insert("events", {...})`.
- **Raw bytes** — pass a collection `str` plus `bytes`. obj does not
  serialise for you on this path; encode however you like (json,
  msgpack, postcard, pickle). Mirrors the obj C ABI contract.

`db.update(Cls, id, fn)` is an atomic read-modify-write inside one
transaction (no lost-update window); a raising `fn` rolls it back.
Per-document lazy migration mirrors Rust's `Migrate` trait via a
`history=[...]` arg and a `cls.migrate(doc, from_version)` classmethod.

> obj's encoding is *positional* (schema-driven, no field names in the
> bytes) and the schema registry is process-global per
> `(collection, version)`. Two handles declaring the same collection
> with a different shape raise `obj.InvalidArgumentError` rather than
> risk a silent mis-encode — use distinct names or bump `version=`.

---

## Transactions

`WriteTxn` batches many typed writes into a **single commit / single WAL
fsync**, and reads see the transaction's own uncommitted writes:

```python
with obj.Db("app.obj") as db:
    with db.transaction() as tx:
        for i in range(1000):
            tx.insert(Order(customer_id=i, total=float(i), status="new"))
        tx.update(Order, 1, lambda o: setattr(o, "status", "shipped"))
        tx.insert("audit_log", b"<raw bytes>")   # raw overload still works
        # one commit + one fsync on __exit__
```

`tx.collection(Cls)` binds a class once and exposes typed CRUD scoped to
the transaction.

---

## Secondary indexes

Declare indexes with `typing.Annotated` markers — `obj.Index`,
`obj.Unique`, `obj.Each` (multi-value, on a `list[...]` field) — plus an
`indexes=[obj.Composite((...), name=...)]` decorator arg. This mirrors
Rust's `#[obj(index ...)]` attributes; the index B-trees are maintained
on every write.

```python
from typing import Annotated

@obj.document(
    collection="orders", version=1,
    indexes=[obj.Composite(("region", "status"), name="by_region_status")],
)
@dataclass
class Order:
    email:  Annotated[str, obj.Unique]      # unique index
    region: Annotated[str, obj.Index]       # standard index
    tags:   Annotated[list[str], obj.Each]  # multi-value index
    status: str = "new"

with obj.Db("app.obj") as db:
    db.insert(Order(email="a@b.com", region="us", tags=["vip"]))
    order = db.find_unique(Order, "email", "a@b.com")          # exact lookup
    for oid, o in db.index_range(Order, "region", "us", "us"): # half-open range
        ...
```

A duplicate `Unique` key raises `obj.InvalidArgumentError` and rolls the
write back atomically.

---

## Querying

`db.query(...)` returns a lazy, immutable builder (each call returns a
fresh `Query`). Pass an `@obj.document` class for typed results or a
collection `str` for dict-native results:

```python
top = (db.query(Order)
         .filter(lambda o: o.status == "shipped")   # AND-combined predicates
         .sort_by(lambda o: o.total)
         .limit(10)
         .fetch())                                   # -> list[tuple[int, Order]]

count = db.query(Order).filter(lambda o: o.total > 100).count()  # no-decode fast path
us    = db.query(Order).index_range("region", "us", "us").fetch()
```

The sort buffer is bounded (`obj.MAX_SORT_BUFFER`, overridable per query
with `.sort_buffer_limit(n)`); an over-cap sort raises rather than
allocating without limit.

---

## Multi-file attach

Open another `.obj` file's collections **read-only** under a namespace,
addressed as `"namespace.collection"`:

```python
with obj.Db("app.obj") as db:
    db.attach("archive.obj", "archive")
    archived = list(db.all("archive.orders"))   # also db.get / db.query
    db.detach("archive")
```

Writes to a namespaced collection raise `obj.InvalidArgumentError`. A
namespaced read needs its schema registered under the namespaced name
(declare a class with `collection="archive.orders"`, or read raw bytes);
an unregistered read fails loud rather than returning garbage.

---

## Async

`obj.AsyncDb` mirrors the blocking `Db` for `asyncio` callers — a thin
wrapper that offloads each call to a thread executor (the GIL is released
around engine work; no new runtime dependency).

```python
import asyncio, obj

async def main():
    adb = await obj.AsyncDb.open("app.obj")
    oid = await adb.insert(Order(email="e@f.com", region="us", tags=[]))
    async for oid, o in adb.all(Order):
        ...
    async with adb.transaction() as tx:      # commits on clean exit
        await tx.insert(Order(email="g@h.com", region="eu", tags=[]))
    await adb.close()

asyncio.run(main())
```

Each async transaction pins one worker thread for its lifetime (the txn
handle is not `Send`), so `async with adb.transaction()` is safe to drive
op-by-op.

---

## Checkpointing

Writes land in a write-ahead log (`<db>.obj-wal`) first; the main file
stays sparse until a checkpoint folds committed WAL pages into it. A
checkpoint fires automatically at ~1000 WAL frames, and a **clean**
`close()` (including a `with` block that exits without raising) folds the
WAL for you — so the `.obj` file is self-contained after normal shutdown.

```python
with obj.Db("app.obj") as db:
    for note in notes:
        db.insert(note)
    db.checkpoint()   # fold on demand (optional)
```

`checkpoint()` is a no-op when there is nothing to fold, and is deferred
if a concurrent reader has pinned a snapshot below the WAL end (retry
once it finishes). The close-time fold is best-effort and non-fatal: a
failure never turns a clean `with` block into a raised error, and an exit
via exception skips the fold and propagates your error unchanged.

**Trade-off:** every clean close ends in an `fsync`. Prefer a single
long-lived handle when one-fsync-per-close is a hot-path bottleneck.

---

## Diagnostics

```python
stat = db.stat()
for cs in stat.collections:
    print(cs.name, cs.doc_count, cs.file_size_bytes)
    for idx in cs.indexes:
        print(" ", idx.name, idx.kind, idx.key_paths, idx.status)

# low-level, type-erased dump of a collection's primary B-tree:
for rec in db.dump_raw("orders", max_records=1000):
    rec.id, rec.header, rec.payload   # id, DocumentHeader, raw postcard bytes
```

---

## Exceptions

All operations raise instances of `obj.ObjError` (the catch-all base).
The sub-exceptions narrow the diagnosis:

| Exception                     | When raised                                              |
|-------------------------------|----------------------------------------------------------|
| `obj.NotFoundError`           | document / collection / index / namespace absent         |
| `obj.BusyError`               | lock contention (pager mutex, writer lock, cross-process)|
| `obj.CorruptionError`         | on-disk format / checksum / B-tree invariant violation   |
| `obj.IntegrityError`          | `Db.integrity_check()` found at least one failure        |
| `obj.InvalidArgumentError`    | caller-side argument problem (encoding, range, type, schema) |
| `obj.EncryptionError`         | missing / wrong / mismatched encryption key              |
| `obj.FeatureUnsupportedError` | file uses a build-time feature this wheel lacks          |

---

## Development

```bash
python3 -m venv .venv && source .venv/bin/activate
pip install maturin pytest

cd crates/obj-py
maturin develop          # build the cdylib + install editable into the venv
pytest tests/ -v
```

The dev loop is "edit Rust → `maturin develop` → `pytest`". For a
release wheel: `maturin build --release` writes `target/wheels/obj-*.whl`.

---

## License

Dual-licensed under [MIT](https://github.com/uname-n/obj/blob/master/LICENSE-MIT)
or [Apache 2.0](https://github.com/uname-n/obj/blob/master/LICENSE-APACHE),
at your option.

