Metadata-Version: 2.4
Name: metrana-logging-engine
Version: 0.0.4
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Rust
Requires-Dist: maturin>=1.7,<2.0 ; extra == 'dev'
Requires-Dist: pytest>=7.0.0,<10.0.0 ; extra == 'dev'
Requires-Dist: numpy>=1.24.0,<3.0.0 ; extra == 'dev'
Provides-Extra: dev
Summary: Rust-backed logging engine for Metrana (PyO3 bindings for metrana-ingestion-logger).
Keywords: metrana,mlops,rlops,logging,rust
Author-email: Inephany <info@inephany.com>
License: Apache 2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# metrana-logging-engine

Rust-backed logging engine for Metrana — [PyO3](https://pyo3.rs) bindings for the
`metrana-ingestion-logger` crate (`libs/ingestion_logger`).

This package wraps the Rust logger's synchronous, gRPC-backed ingestion client
and exposes it to Python. It is intended to be imported by `client/metrana` as a
high-performance logging backend; it has no dependency on the rest of the
`metrana` Python package.

## Layout

```
client/metrana-logging-engine/
├── Cargo.toml              # standalone crate (not a root-workspace member)
├── pyproject.toml          # maturin build backend, abi3 wheel
├── src/
│   ├── lib.rs              # #[pymodule] _metrana_logging_engine
│   ├── config.rs           # Config
│   ├── enums.rs            # ResumeStrategy / Backpressure / ErrorStrategy / LogLevel
│   ├── points.rs           # column extraction (numpy/list) → SeriesFloatPoints
│   ├── logger.rs           # Logger
│   └── errors.rs           # exception hierarchy
└── python/metrana_logging_engine/   # pure-Python wrapper + type stubs
```

The compiled extension is `logger_engine._metrana_logging_engine`; `__init__.py`
re-exports its public symbols.

## Build

Requires a Rust toolchain and [maturin](https://www.maturin.rs).

```bash
make develop          # build + install into the active venv (editable-ish)
# or
maturin develop       # equivalent
maturin build --release   # produce an abi3 wheel under target/wheels/
```

The wheel is built `abi3-py310`, so a single wheel per platform covers Python
3.10, 3.11 and 3.12.

## Tests

`make test` builds the extension and runs the pytest suite. It always runs the
smoke tests; `tests/test_e2e.py` additionally drives the full local pipeline
(logger -> ingestion -> kafka -> worker -> clickhouse -> query service) but is
skipped unless the E2E stack is up:

```bash
./development/e2e_up.sh    # from the repo root; see development/README.md
make test
```

## Benchmark

`benches/rl_logging_bench.py` measures the client-side burst cost of the
canonical RL telemetry shape against a live stack (defaults: 1024 envs x 16
metrics, 100 minor points per series per one-minute cycle, ~1.6M points per
burst from a single logger). It reports per-cycle enqueue wall time, drain
lag until the server acked, CPU and RSS; `--cycle-secs 0` runs bursts
back-to-back as a max-throughput mode.

The target workspace must already exist (workspaces are provisioned with
`metrana-admin init-workspace`, not through the ingestion API). The defaults
match the e2e stack, which pre-creates `e2e-tests` and disables API-key auth;
for any other endpoint pass `--workspace` and a real `--api-key`.

```bash
. .inephany-env/bin/activate && python benches/rl_logging_bench.py
```

## Usage

```python
from metrana_logging_engine import Config, Logger, ResumeStrategy

cfg = Config(
    api_key="...",
    workspace="my-workspace",
    project="my-project",
    run_name="run-001",
    endpoint="https://ingestion.metrana.ai",
    resume_strategy=ResumeStrategy.Allow,
)

import numpy as np

with Logger(cfg) as logger:
    # Ordered scalar series; steps/timestamps auto-assigned when omitted.
    logger.log_std_floats("loss", "ML_STEP", [1.0, 0.8, 0.6], labels={"split": "train"})

    # NumPy arrays are copied in one memcpy (no per-element unboxing) — the
    # preferred path for packing many points per call. `first_step` gives a
    # contiguous step range and `timestamp_millis` a single uniform timestamp,
    # so you don't build full step/timestamp arrays yourself.
    logger.log_std_floats(
        "loss", "ML_STEP",
        values=np.asarray(losses, dtype=np.float64),
        first_step=1000,
        timestamp_millis=now_ms,
    )

    # RL series with a major step and optional episode.
    logger.log_rl_floats(
        "cartpole", "reward", "ENVIRONMENT_STEP",
        rl_step=42,
        values=[10.0], steps=[5],
        episode=3,
    )

    # Config / arbitrary attributes (nested dicts/lists are flattened).
    logger.log_config({"lr": 1e-3, "batch_size": 32})
# close() runs on __exit__: flushes pending events and joins the sender thread.
```

### API summary

| Python | Rust |
|---|---|
| `Logger(config)` | `Logger::new` |
| `log_std_floats(metric, scale, values, labels=None, *, steps=None, first_step=None, timestamps_millis=None, timestamp_millis=None)` | `log_std_floats` (ordered) |
| `log_distributed_floats(...)` | `log_distributed_floats` (unordered) |
| `log_rl_floats(env, metric, scale, rl_step, values, episode=None, labels=None, *, steps=…, first_step=…, timestamps_millis=…, timestamp_millis=…)` | `log_rl_floats` |
| `log_config(value)` / `log_attributes(prefix, value)` | `log_config` / `log_attributes` |
| `get_series_last_step(metric, scale, labels=None)` | `get_series_last_step` |
| `get_env_last_rl_step_and_episode(env)` | `get_env_last_rl_step_and_episode` |
| `extract_sender_errors()` | `extract_sender_errors` (as `list[str]`) |
| `close(timeout_secs=None)` | `close` |

`labels` accepts a `dict[str, str]` or a sequence of `(key, value)` pairs.
Durations are passed as seconds (`float`).

### Exceptions

```
MetranaLoggerError
├── LoggerInitError          # Logger() init/handshake failed
├── EnqueueError             # log_* failed
│   ├── LoggerClosedError
│   ├── QueueFullError       # Backpressure.Raise with a full queue
│   └── ValidationError      # invalid metric/scale/label/points/step/episode
├── SenderError              # background sender-thread errors surfaced
└── ClosingError             # close() failed
```

## Allocator

The extension uses jemalloc (`tikv-jemallocator`) as its global allocator,
governing all Rust allocations in the linked crate graph (prost/tonic/rustls
buffers, batching vectors, the Python→Rust point copy). This is the default
`jemalloc` feature.

It is built with jemalloc's **`disable_initial_exec_tls`** setting, which puts
jemalloc's thread-specific data in the global-dynamic TLS model (allocated
lazily via `__tls_get_addr`) instead of initial-exec. That keeps it out of the
static TLS image, so the `.so` `dlopen`s cleanly with **no `GLIBC_TUNABLES` env
var** — important for a library imported into arbitrary researcher processes.
The trade-off is a small indirection on jemalloc's TSD access; negligible in
practice.

To build without jemalloc (system allocator):

```bash
maturin develop --release --no-default-features
```

> Maintainer caution: keep `disable_initial_exec_tls` on for any jemalloc build.
> Without it, jemalloc's ~2.6 KB TSD uses initial-exec TLS and exceeds glibc's
> static-TLS surplus for `dlopen`ed libraries, so `import metrana_logging_engine` fails
> with `ImportError: cannot allocate memory in static TLS block`. If that ever
> regresses, the launcher-side escape hatches are
> `GLIBC_TUNABLES=glibc.rtld.optional_static_tls=2097152` (set before `python`
> starts) or `LD_PRELOAD`ing the `.so`.

## Not yet exposed

- `aggregation_rules` on `Config` — requires protobuf message bindings and is
  currently always empty.

