Metadata-Version: 2.4
Name: fastdb4py
Version: 0.1.22
Summary: FastCarto database bindings
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Dynamic: license-file

# fastdb

[![PyPI version](https://badge.fury.io/py/fastdb4py.svg)](https://badge.fury.io/py/fastdb4py)
[![npm version](https://badge.fury.io/js/fastdb4ts.svg)](https://badge.fury.io/js/fastdb4ts)
[![Run Tests](https://github.com/world-in-progress/fastdb/actions/workflows/tests.yml/badge.svg)](https://github.com/world-in-progress/fastdb/actions/workflows/tests.yml)

`fastdb` is a C++ local database library designed as a fast, lightweight, and easy-to-use data communication layer for RPC and coupled modeling in scientific computing.

This repository now contains three closely related layers:

- **C++ core** — native storage engine, binary layout, and serialization primitives
- **`fastdb4py`** — Python bindings via SWIG, with NumPy-oriented columnar access and shared-memory IPC
- **`fastdb4ts`** — TypeScript bindings via WebAssembly/Embind, focused on browser-friendly typed data access and schema-compatible table access

**Core design goals:**
- **Zero-copy columnar access** — efficient field-oriented access for high-volume numerical workloads
- **Ref-graph support** — Features can reference other Features across tables, forming typed object graphs
- **Compact binary transport** — save/load databases as binary buffers or files; shared-memory deserialization for zero-copy IPC
- **Cross-binding consistency** — Python and TypeScript bindings share the same native storage model and schema semantics
- **Schema-driven codegen** — Python `@feature` classes can serve as the source of truth; the `fdb codegen` CLI generates equivalent TypeScript schemas automatically
- **Portable payload primitives** — `fastdb.schema.v1`, shared binary buffers, and the `fastdb4ts` runtime let external RPC systems use FastDB as a schema-aware payload layer while those systems keep their own routing and execution semantics

## Documentation map

- **Python binding (`fastdb4py`)**: see [`python/README.md`](python/README.md)
- **TypeScript binding (`fastdb4ts`)**: see [`ts/README.md`](ts/README.md)
- **C++ core (`fastcarto/fastdb`)**: see [`fastcarto/README.md`](fastcarto/README.md)
- **TypeScript/WASM analysis docs**: see [`ts/analysis/`](ts/analysis/)
- **Codegen CLI (`fdb codegen`)**: see [CLI tools](#cli-tools) below, or the full reference in [`python/README.md`](python/README.md)

## Changelog

See [CHANGELOG.md](CHANGELOG.md) for per-binding unreleased changes. For historical release notes, see the [GitHub Releases](https://github.com/world-in-progress/fastdb/releases) page.

## Installation

### Python binding (fastdb4py)

```bash
pip install fastdb4py
```

### TypeScript binding (fastdb4ts)

```bash
npm install fastdb4ts
```

## Quick start

For a minimal end-to-end example, start with:

- [`python/README.md`](python/README.md) for `fastdb4py`
- [`ts/README.md`](ts/README.md) for `fastdb4ts`

If you are working on native internals or storage layout, start with:

- [`fastcarto/README.md`](fastcarto/README.md)

## Python Backed View Lifetimes

`fastdb4py` distinguishes owned Python `@feature` objects from backed table views. Owned objects keep normal `__dict__` read/write behavior. Backed rows, checked numeric columns, `StringColumn`, and `BytesColumn` can be tied to a `FdbViewOwner`; after `fdb.invalidate(owner_or_view)`, later checked reads or writes raise `FdbViewInvalidatedError`. Standalone FastDB tables remain trusted and return raw NumPy numeric columns by default, while integrations with reusable memory leases should pass `FdbViewOwner(checked=True, ...)` and use `fdb.materialize(...)` or `value.to_owned()` before retaining data beyond the lease. See [`python/README.md#backed-view-lifetimes`](python/README.md#backed-view-lifetimes) for the Python API details.

For safety-sensitive integrations, pass `writeable=False` to expose read-only backed rows and checked numeric columns. This blocks row field writes and column writes even when the owner itself is an unchecked trusted owner.

## Python `ColumnEngine.truncate()` with `STR`

`fastdb4py` `ColumnEngine.truncate()` now supports UTF-8 `STR` fields in two usage tiers:

- **Default high-level path** — `tbl.fill(..., name=[...])` now routes raw strings through the native batch string-column API
- **Advanced prepacked path** — `pack_utf8_column([...]) + tbl.column.name.fill_utf8(...)`

For fixed tables, the high-level `Table.fill(...)` path batches numeric columns and `STR` payloads together. Raw string inputs are packed inside the native batch API, scalar `BOOL` columns use the same explicit bool parser as mutable engine writes before bulk numeric storage, ordinary `U8` columns remain numeric casts, numeric columns still remain NumPy-backed after publication, and string columns are exposed as `StringColumn` wrappers via `table.column.<name>`. If your input already starts as Python `str` objects, prefer this default raw path; use the prepacked path only when an upstream stage already produced UTF-8 offsets/data buffers.

```python
import numpy as np
from fastdb4py import ColumnEngine, Layout, F64, STR, feature, pack_utf8_column

@feature
class Point:
    x: F64
    y: F64
    name: STR

orm = ColumnEngine.truncate([Layout(Point, 3)])
tbl = orm.table(Point)

tbl.fill(
    x=np.array([1.0, 2.0, 3.0], dtype=np.float64),
    y=np.array([4.0, 5.0, 6.0], dtype=np.float64),
    name=["a", "bb", "ccc"],
)

# If you already own pre-encoded UTF-8 buffers, use the advanced path directly:
offsets_u32, utf8_bytes_u8 = pack_utf8_column(["a", "bb", "ccc"])
tbl.column.name.fill_utf8(offsets_u32, utf8_bytes_u8)
```

## Python Call-DB Exact Export

For integrations that already own a generic call-db binding, `try_export_call_db(binding, value)` returns an existing buffer-protocol view when a value is already backed by an exact call-db-compatible single fixed `Batch[Feature]` table. Build such tables with the target table name up front, for example `ColumnEngine.truncate([Layout(Point, n, name="return_0")])`, then call `encode_call_db(...)` only when `try_export_call_db(...)` returns `None`. FastDB owns the exact-export decision; integrations such as C-Two should pass the generic binding and logical value rather than inspecting FastDB table internals.

## Experimental Final-Backing Builds

`build_call_db(binding, value, allocator, direct_required=True)` is the experimental final-backing path for generic call-db payloads. FastDB computes one final DB byte length, asks the supplied allocator for one writable allocation, and writes the final backing without first publishing through `WxMemoryStream().data().tobytes()`. Fixed numeric call-db values use a mapped final-backing path that writes the initial C++ layout directly into the caller backing and fills columns there; prepacked string feature columns still use the C++ final writer. The allocator may be a native `fdb.HeapFinalBackingResource`, which returns a committed `FinalBackingAllocation`, or a Python allocator object that returns an allocation with `.buffer`, `.commit(used_size)`, and `.rollback()`. Native final backing resources also work for fallback prepared plans, so callers can preserve fallback semantics when `direct_required=False`. `prepare_call_db(..., direct_required=True)` is stricter: it only accepts already-backed/importable layers and will not stage temporary call-db layers under a direct label.

Committed native `FinalBackingAllocation` objects can be passed directly to `decode_call_db(...)` or `view_call_db(...)`. `view_call_db(...)` keeps the allocation owner alive while checked FastDB views are active; uncommitted or rolled-back allocations cannot be read.

FastDB also exposes experimental `ScratchAllocator` / `HeapScratchAllocator` names as the separate build-time scratch role. V1 keeps this role FastDB-owned and heap-backed; C-Two-style integrations should provide final backing first and should not assume dynamic builder scratch is transport memory.

For resource functions that author fixed-size columnar outputs with `fdb.require(...)`, use `call_db_build_context(binding, allocator)` around the call. Inside the context, eligible fixed numeric `Batch`/`Array` slots are mapped directly over the caller allocation, and the later `build_call_db(..., direct_required=True)` commits that same allocation instead of allocating a second call-db buffer. The C++ fixed-layer builder writes the initial zero table section directly to the caller backing without retaining an equal-size table scratch vector. The allocator may be either a Python allocation protocol object or a native `fdb.HeapFinalBackingResource`; FastDB exposes only a context-scoped writable view for the native resource, while direct `resource.allocate(...)` remains hidden from Python.

```python
allocator = fdb.HeapFinalBackingResource()
with fdb.call_db_build_context(binding, allocator):
    cells, residual = fdb.require(
        fdb.batch(Cell, rows=n),
        fdb.array(fdb.F32, rows=n),
    )
    cells.fill(row_id=ids, x=xs, y=ys)
    residual.fill(rs)
    payload = fdb.build_call_db(binding, (cells, residual), allocator, direct_required=True)
```

V1 direct builds are intentionally narrow. The `build_call_db` final-writer path supports fixed columnar scalar payloads and backed `Batch[Feature]` values whose `STR` columns already have prepacked UTF-8 offsets/data. The `call_db_build_context` path is stricter and currently supports fixed numeric columnar slots only, because its final byte length must be known before user code fills the returned views. Object graph payloads, non-columnar `BatchRequirement` profiles, REF/list/bytes fields, dynamic push, scalar string arrays, and unknown-size string values use fallback, or raise `FastdbUnsupportedDirectBuildError` when `direct_required=True`.

## CLI tools

`fastdb4py` ships a CLI named `fdb` for cross-language tooling. Currently it provides the `codegen` subcommand.

### `fdb codegen` — Python → TypeScript schema generator

Generate TypeScript `Feature` classes from a directory of Python feature definitions:

```bash
fdb codegen --ts ./python_features/ ./ts_features/
```

This mirrors the input directory structure, generating one `.ts` file per `.py` file. Each Python `Feature` subclass becomes a TypeScript class with `defineSchema(...)` and `declare` fields.

Features:
- All scalar types (`U8`–`F64`, `STR`, `WSTR`, `BYTES`, `BOOL`) and native Python types (`int`, `float`, `str`, `bool`) are mapped automatically
- Feature references → `ref(ClassName)`, lists of Features → `listOf(ref(ClassName))`
- Circular/self-referential types → lazy refs `ref(() => ClassName)` detected automatically
- Cross-file dependencies → relative `import` statements in the generated TypeScript
- Topological ordering ensures dependency classes are emitted before dependents
- Same class name in different files is legal — each file is an independent module, all are generated

Example input (`geometry.py`):

```python
from fastdb4py import feature, F64, STR


@feature
class Point:
    x: F64
    y: F64
    label: STR
```

Generated output (`geometry.ts`):

```typescript
import { F64, Feature, STR, defineSchema } from 'fastdb4ts';

export class Point extends Feature {
  static schema = defineSchema({
    x: F64,
    y: F64,
    label: STR,
  });
  declare x: number;
  declare y: number;
  declare label: string;
}
```

## C-Two Integration Boundary

FastDB owns storage engines, schema export, binary database buffers, backed view lifetimes, and generic Python/TypeScript call-db runtime APIs. C-Two owns CRM method planning, call-db binding derivation from CRM annotations, TypeScript helper generation through `c3 contract codegen typescript --fastdb-schema`, route identity, relay behavior, scheduler policy, and memory lease semantics. The FastDB `fdb` CLI now only generates generic TypeScript feature schemas; use the C-Two repository for C-Two-specific contract and client helper generation.

## Performance Notes

| Pattern | Throughput | Notes |
|---------|-----------|-------|
| `table.column.x[:]` columnar read/write | **~100 ns** for any N | Zero-copy NumPy view, 1 SWIG call |
| `Table.fill(**cols)` | **~2 µs** per column | 1 SWIG call + memcpy per written column |
| `feature.read_all_scalars()` | **~200 ns** for 3 fields | 1 SWIG call for all scalar fields |
| `table.iter_reuse()` row access | **~350 ns/row** | Reuses Feature wrapper, no allocation |
| `for feat in table` row access | **~1.2 µs/row** | Allocates Feature wrapper per row |
| `feat.x` single field read (db-mapped) | **~420 ns** | 1 SWIG call |
| `FastSerializer.dumps/loads` (Python, legacy) | **~70 µs** (complex graph) | Retained for compatibility; not the foundation for new external RPC integration work |
| `FastSerializer.dumps/loads` (TypeScript, legacy) | **~75 µs** (complex graph) | Retained for compatibility; not the foundation for new external RPC integration work |

**Recommended patterns by use case:**

- **Bulk read/write of one field across all rows** → `table.column.x` (columnar, zero-copy)
- **Bulk fill fixed-size tables** → `ColumnEngine.truncate` + `table.fill(...)`
- **Bulk fill pre-encoded UTF-8 buffers** → `table.column.name.fill_utf8(...)`
- **Iterate and process all fields per row** → `table.iter_reuse()` + `feat.read_all_scalars()`
- **Sparse random access** → `table[i].field`

## Free-threaded Python (PEP 703)

`fastdb4py` includes preliminary support for Python 3.13+ free-threaded builds (`python3.13t`).

### Thread-safety guarantees

| Component | Thread-safe? | Notes |
|---|---|---|
| Module-level caches (`get_class_schema`, serializer schema) | ✅ Yes | Protected by `threading.Lock`; safe under both GIL and free-threaded builds |
| `ColumnAccessor` column cache (`table.column.x`) | ✅ Yes | Cold path (first access) is lock-protected; hot path (cache hit) is lock-free |
| `Table` row reads (`table[i]`, iteration, `iter_reuse()`, fallback string lookup) | ✅ Yes | Per-table row materialization uses a read lock around native `tryGetFeature(...)` calls |
| `Feature` instances | ❌ No | Instance-level `_cache` dict is not synchronized — use external locking or one instance per thread |
| `ColumnEngine` / `ObjectEngine` / `Table` mutation | ❌ No | Not designed for concurrent mutation — create separate engine instances per thread, or synchronize externally |
| SWIG C++ calls | ✅ Yes | Long-running pure C++ operations release the GIL via `%feature("threadallow")` |

### Recommended patterns for multi-threaded code

```python
import threading
import numpy as np
from fastdb4py import ColumnEngine, Layout, feature, F64


@feature
class Point:
    x: F64

# ✅ Good: each thread owns its own truncate view
def worker():
    orm = ColumnEngine.truncate([Layout(Point, 1000)])
    tbl = orm.table(Point)
    tbl.fill(x=np.arange(1000, dtype=np.float64))

# ✅ Good: shared truncate engine with read-only access after publication
shared_orm = ColumnEngine.truncate([Layout(Point, N)])
# ... fill data ...
# Multiple threads can safely read table.column.x concurrently

# ⚠️ Caution: sharing Feature instances across threads
lock = threading.Lock()
feat = Point()
feat.x = 1.0
with lock:           # external synchronization required
    feat.x = 2.0
```

### Build configuration

The CI tests against Python 3.13t (free-threaded) in addition to standard 3.12. The `setup.py` auto-detects `Py_GIL_DISABLED` and passes the flag to the C++ build.

## Development

This project uses DevContainer for the development environment. See `.devcontainer/devcontainer.example.json` for configuration details. Requires Docker/Podman and the VSCode DevContainer extension.

Common development commands from the repository root:

```bash
./py_utils.sh --clean   # remove C++ build artifacts and SWIG-generated bindings
./py_utils.sh --build   # build C++ core + Python bindings
./py_utils.sh --test    # run Python unit tests
uv run pytest tests/python -q  # run the Python test suite directly
uv build             # build the fastdb4py sdist + local wheel
bash ts/build-wasm.sh   # build the WebAssembly module for fastdb4ts
npm run test:ts         # run root TypeScript tests
fdb codegen --ts <input_dir> <output_dir>  # generate TypeScript schemas from Python features
```

Build requirements depend on the layer you are working on:

- **Python binding**: C++17 compiler, CMake >= 3.16, SWIG >= 4.0, NumPy
- **TypeScript/WASM binding**: Emscripten, Node.js, npm
- **Native core**: C++17 compiler and CMake

### Python release checklist

Before publishing `fastdb4py`, bump `[project].version` in `pyproject.toml`, refresh `uv.lock`, update the `fastdb4py` section in `CHANGELOG.md`, and verify that the release tag `py/v<version>` does not already exist. The PyPI workflow publishes only when `pyproject.toml` changes and the tag for that version is absent.
