Metadata-Version: 2.4
Name: obj-db
Version: 1.0.2
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Rust
Classifier: Topic :: Database :: Database Engines/Servers
Summary: Python binding for the obj embedded document database.
Author: obj contributors
License: MIT OR Apache-2.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/uname-n/obj
Project-URL: Repository, https://github.com/uname-n/obj

# obj — Python binding

Python bindings for [`obj`](https://github.com/uname-n/obj), the
embedded document database.

The wheel exposes a single extension module named `obj`. The Rust
crate name is `obj-py`; the import name is `obj`.

```python
import obj

with obj.Db("app.obj") as db:
    with db.transaction() as tx:
        doc_id = tx.insert("orders", b"<your payload bytes>")

    with db.read_transaction() as tx:
        payload = tx.get("orders", doc_id)
        for (id_, bytes_) in tx.iter_all("orders"):
            ...
```

## Payload contract

`obj-py` ships **two** Python surfaces side by side:

- **Bytes API** on `WriteTxn` / `ReadTxn`. Payloads cross the
  boundary as `bytes` / `bytearray` in and `bytes` out. The
  library does NOT serialise dicts, dataclasses, or JSON for you
  on this path — encode your payloads however you like (`json`,
  `msgpack`, `postcard`, `pickle`, ...) and pass the resulting
  bytes through. This mirrors the obj C ABI's contract.
- **Typed-document API** on `Db` *and* `WriteTxn` (Phase 6.5 +
  issue #1). Wrap a `@dataclass` with
  `@obj.document(collection="orders", version=1)` and the ergonomic
  methods `db.insert(order)` / `db.get(Order, id)` /
  `db.update(Order, id, fn)` / `db.all(Order)` route through a
  schema-driven `Dynamic` codec that produces postcard bytes
  byte-identical to Rust's `#[derive(Document)]` writer for the
  same logical schema. `db.update(...)` is an atomic
  read-modify-write: the read and the write-back happen inside one
  write transaction (no lost-update window), and a raising `fn`
  rolls the change back.

```python
from dataclasses import dataclass
import obj

@obj.document(collection="orders", version=1)
@dataclass
class Order:
    customer_id: int
    total: float
    status: str

with obj.Db("app.obj") as db:
    doc_id = db.insert(Order(customer_id=1, total=99.5, status="pending"))
    order = db.get(Order, doc_id)
    for (oid, o) in db.all(Order):
        ...
```

### Typed docs inside an explicit transaction

`WriteTxn` overloads its CRUD methods by argument type, so typed
documents compose with explicit transactions. Pass a `@obj.document`
instance (or class) for the typed path, or a collection `str` plus
`bytes` for the raw path. This lets you batch many typed writes into
a **single commit / single WAL fsync** instead of one transaction per
`db.insert`:

```python
with obj.Db("app.obj") as db:
    with db.transaction() as tx:
        for i in range(1000):
            tx.insert(Order(customer_id=i, total=float(i), status="new"))
        # one commit + one fsync for the whole batch on __exit__

        # reads inside the txn see its own uncommitted writes:
        first = tx.get(Order, 1)
        tx.update(Order, 1, lambda o: setattr(o, "status", "shipped"))
        tx.upsert(Order, 2, Order(customer_id=2, total=2.0, status="done"))
        tx.delete(Order, 3)

        # the raw-bytes overload still works on the same handle:
        tx.insert("audit_log", b"<raw bytes>")
```

The typed `WriteTxn` methods reuse the exact encode/decode pipeline
that `Db` uses, so on-disk bytes are identical regardless of which
surface wrote them. Passing a value that is not a `@obj.document`
to the typed path raises `obj.InvalidArgumentError` with a clear
message.

For ad-hoc dict-shaped writes (no `@document` boilerplate), call
the same CRUD methods with a collection `str` as the first argument
(dict-native overload):

```python
doc_id = db.insert("events", {"event": "click", "user_id": 42})
event = db.get("events", doc_id)
```

Per-document lazy migration mirrors Rust's `Migrate` trait via a
`history=[...]` arg and a `cls.migrate(doc, from_version)`
classmethod.

## Checkpointing

Writes land in a write-ahead log (`<db>.obj-wal`) first; the main
`<db>.obj` file stays sparse until a checkpoint folds the committed
WAL pages into it and resets the WAL back to its 64-byte header. A
checkpoint fires automatically once the WAL reaches ~1000 frames, but
after a handful of writes the data lives entirely in the `-wal` file.
Call `db.checkpoint()` to fold it on demand:

```python
with obj.Db("app.obj") as db:
    for note in notes:
        db.insert(note)
    db.checkpoint()   # fold the WAL into app.obj, reset app.obj-wal
```

`checkpoint()` is a harmless no-op when there is nothing to fold, and
is *deferred* (partial / no-op) if a concurrent reader has pinned a
snapshot below the end of the WAL — the frames that reader still needs
stay in place. Retry once the reader has finished. It raises
`obj.ObjError` on a read-only handle or on an I/O failure.

### Checkpoint on clean close

You usually do not need an explicit `checkpoint()`: a **clean**
`close()` — including a `with obj.Db(...) as db:` block that exits
without raising — folds the WAL into the main file for you, so the
`.obj` file is self-contained after a normal shutdown.

```python
with obj.Db("app.obj") as db:
    db.insert(note)
# block exited cleanly -> WAL folded into app.obj, app.obj-wal reset
```

The close-time checkpoint is **best-effort and non-fatal**: a failure
(reader-pinned deferral, I/O error during shutdown, read-only handle)
is swallowed and never turns a successful `with` block into a raised
error — the committed data is already durable in the WAL, so a failed
fold loses nothing.

If the block exits **via an exception**, the checkpoint is skipped and
the exception is propagated unchanged — the close-time fold never masks
your error.

**Trade-off:** every clean close ends in an `fsync`. If you open and
close many short-lived handles on a hot path, that is one fsync per
close; prefer a single long-lived handle (and an occasional explicit
`checkpoint()`) when the per-close fsync is a bottleneck.

## Local development loop

```bash
# One-time setup: a fresh venv + maturin + pytest.
python3 -m venv .venv
source .venv/bin/activate
pip install maturin pytest

# From the workspace root:
cd crates/obj-py
maturin develop          # builds the cdylib + installs it editable
                         # into the active venv.
pytest tests/ -v         # run the Python test suite.
```

`maturin develop` rebuilds the extension module on every invocation;
the typical dev loop is "edit Rust → `maturin develop` → `pytest`".

For a release-style wheel:

```bash
maturin build --release          # writes target/wheels/obj-*.whl
pip install target/wheels/obj-*.whl
```

## Exception hierarchy

All `obj` operations raise instances of `obj.ObjError`. The
sub-exceptions narrow the diagnosis:

| Exception                  | When raised                                                   |
|----------------------------|----------------------------------------------------------------|
| `obj.NotFoundError`        | document / collection / index / namespace absent              |
| `obj.BusyError`            | lock contention (pager mutex, writer lock, cross-process)     |
| `obj.CorruptionError`      | on-disk format / checksum / B-tree invariant violation        |
| `obj.IntegrityError`       | `Db.integrity_check()` found at least one failure             |
| `obj.InvalidArgumentError` | caller-side argument problem (encoding, range, type)          |

`ObjError` itself is the catch-all base — subclasses `Exception`.
Use `except obj.ObjError` if you don't care which sub-arm fired;
use the narrow ones to recover.

