Metadata-Version: 2.4
Name: metrana
Version: 0.5.2
Summary: Inephany client library to use Metrana.
Author-email: Inephany <info@inephany.com>
License: Apache 2.0
Keywords: metrana,mlops,rlops,ml,metrics
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<3.0.0,>=1.24.0
Requires-Dist: loguru<0.8.0,>=0.7.0
Requires-Dist: metrana-logging-engine<1.0.0,>=0.0.2
Provides-Extra: rendering
Requires-Dist: av<18.0.0,>=12.0.0; extra == "rendering"
Provides-Extra: dev
Requires-Dist: pytest<10.0.0,>=7.0.0; extra == "dev"
Requires-Dist: pytest-mock<4.0.0,>=3.10.0; extra == "dev"
Requires-Dist: bump-my-version==1.4.1; extra == "dev"
Requires-Dist: black==26.5.1; extra == "dev"
Requires-Dist: isort==8.0.1; extra == "dev"
Requires-Dist: flake8==7.3.0; extra == "dev"
Requires-Dist: pre-commit==4.6.0; extra == "dev"
Requires-Dist: mypy==2.1.0; extra == "dev"
Requires-Dist: typeguard==4.5.2; extra == "dev"
Dynamic: license-file

# Metrana Client Library

Metrana is a metrics tracking client for ML/RL training runs. It provides a simple API to log
metrics from training loops to the Metrana ingestion service. Logging is non-blocking: a
Rust-backed engine batches points, streams them over gRPC, applies backpressure, and retries
transient failures on a background thread, so your training loop is not slowed down.

## Installation

```bash
pip install metrana
```

Requires Python 3.10+.

**Supported platforms.** `metrana` depends on the native `metrana-logging-engine`, which ships
prebuilt wheels (there is no source distribution) for: Linux x86-64 and aarch64 (glibc `manylinux_2_28`
and musl `musllinux_1_2`), macOS x86-64 and Apple Silicon, and Windows x86-64. On any other platform
(e.g. Windows on ARM, or an older-than-`manylinux_2_28` Linux) `pip install` will fail to find a
compatible wheel — open an issue if you need a target added.

To log RL environment video with `metrana.log_rendering()`, install the optional `rendering`
extra (pulls in PyAV for client-side H.264 encoding):

```bash
pip install 'metrana[rendering]'
```

## Quick Start

### ML training run

```python
import metrana

metrana.init(
    api_key="your-api-key",
    workspace_name="my-workspace",
    project_name="my-project",
    run_name="run-001",
)

for step in range(num_steps):
    ...
    metrana.log("loss", loss)                      # one metric
    metrana.log({"accuracy": acc, "lr": lr})       # several at once

metrana.close()   # flush and shut down — always call this
```

### RL training run

```python
import metrana

metrana.init(api_key="...", workspace_name="ws", project_name="proj", run_name="rl-001")

for rl_step in range(num_updates):
    ...
    # one metric per episode
    metrana.log_rl_episode("episode_return", ep_return, rl_step=rl_step, episode=episode)

    # per-environment-step metrics for a batch of envs at once
    metrana.log_rl_environment_step(
        "reward", rewards, rl_step=rl_step, env_id=env_ids, episode=episodes
    )

metrana.close()
```

## Logging standard metrics

`metrana.log(metric_name, value, *, step=None, timestamp=None, scale=None, labels=None, evaluation=False)`
logs to the default ML-step scale.

**Values** may be a scalar or an array — a Python list or any NumPy / PyTorch / JAX / TensorFlow
tensor (any float dtype, on any device); arrays are converted to contiguous NumPy once before
crossing into the engine:

```python
metrana.log("loss", 0.5)                       # one point
metrana.log("loss", [0.51, 0.49, 0.48])        # three points (bulk)
metrana.log("grad_norm", grad_norm_tensor)     # torch/np/jax/tf tensor
```

**Multiple metrics at once** — pass a mapping (values may themselves be scalars or arrays):

```python
metrana.log({"loss": loss, "accuracy": acc})
```

### Steps

A series is identified by `(metric_name, scale, labels)`, and each series has its own step axis. The
`step` argument controls it:

| `step` value      | meaning                                                            |
|-------------------|-------------------------------------------------------------------|
| `None` (default)  | auto-increment from the series' last step                          |
| a single `int`    | the step of the first point; further points in the call continue from it |
| a sequence/array  | an explicit step per point (length must match `value`)            |

```python
metrana.log("loss", 0.5)                        # auto: next step
metrana.log("loss", [a, b, c], step=100)        # steps 100, 101, 102
metrana.log("loss", [a, b, c], step=[10, 20, 30])  # explicit steps
```

Timestamps work the same way via `timestamp` (Unix **milliseconds**): `None` lets the server stamp
on arrival; a single int applies to every point; a sequence gives one per point.

### Scale, labels, and evaluation

These three arguments shape the series identity:

- **`scale`** — the step scale (a `StandardMetricScale` value: `"ML_STEP"`, `"EPISODE"`,
  `"ENVIRONMENT_STEP"`). `None` defaults to `ML_STEP`. Only `log` / `log_distributed` take it; the RL
  helpers fix their own scale.
- **`labels`** — a `dict[str, str]` that, together with the name and scale, identifies the series.
  Two points with the same name but different labels go to different series.
- **`evaluation`** — a shorthand that adds the label `{"evaluation": "true"}` (unless you already set
  the `evaluation` key in `labels`), so evaluation points form a series distinct from otherwise
  identically-identified training points.

```python
metrana.log("reward", train_reward)                       # training series
metrana.log("reward", eval_reward, evaluation=True)        # distinct eval series
metrana.log("reward", r, labels={"policy": "greedy"})      # distinct labelled series
```

### Retrieving the last step

`metrana.get_last_step(metric_name, scale=None, labels=None)` returns the last step logged for a
series, or `None` (pass the same `scale` / `labels` you logged it with). This is
seeded from the server at `init()`, so after a restart or resume you can continue from where the run
left off — useful when you want explicit steps but need to know the current position:

```python
last = metrana.get_last_step("loss")
next_step = 0 if last is None else last + 1
metrana.log("loss", loss, step=next_step)
```

### Closing

`metrana.close()` flushes queued points and shuts the background engine down. **Always call it** —
the engine runs on a daemon thread, so if the interpreter exits without `close()`, queued-but-unsent
points are lost (a warning is emitted at exit). Use it directly or rely on `try/finally`.

`close()` flushes for up to `close_timeout` seconds (default 15). To abandon a run immediately,
dropping anything not yet sent, pass `metrana.close(close_timeout=0)` — pair it with
`init(skip_drain_render_on_close=True)` if you also want queued rendering frames dropped rather than
encoded.

## Logging RL metrics

RL metrics use a two-level step: a major `rl_step` (the training/update step, which must not
decrease) plus a minor step. Two helpers cover the common scales; the scale is implied by the
function, so you never pass it explicitly. Both helpers also accept `labels` and `evaluation`, which
behave exactly as for [standard metrics](#scale-labels-and-evaluation).

### Per-episode metrics

`metrana.log_rl_episode(metric_name, value, rl_step, episode=None, env_id=None, labels=None, evaluation=False)`
— the `episode` is the minor step. Omit it to auto-increment from the last logged episode:

```python
metrana.log_rl_episode("episode_return", ep_return, rl_step=rl_step, episode=episode)
```

### Per-environment-step metrics

`metrana.log_rl_environment_step(metric_name, value, rl_step, env_id=None, episode=None, labels=None, evaluation=False)`
logs within-episode steps. The env-step (minor) axis is assigned automatically and resets whenever
`episode` changes, so you supply the `episode` each point belongs to, not the env-step.

It is vectorized over environments. Pass a list of env ids and a matching value block:

- single env: `env_id="env0"`, `value` a scalar or 1D `[T]` array;
- many envs: `env_id=["env0", "env1", ...]` (length `M`), `value` a 1D `[M]` (one point each) or 2D
  `[M, T]` array. `episode` (and `timestamp`) broadcast: a scalar, 1D, or 2D matching the value.

```python
# 8 envs, one reward each at this rl_step
metrana.log_rl_environment_step("reward", rewards_8, rl_step=rl_step,
                                env_id=env_ids, episode=episodes)

# 8 envs x 128 timesteps in one call
metrana.log_rl_environment_step("reward", rewards_8x128, rl_step=rl_step,
                                env_id=env_ids, episode=episodes_8x128)
```

`metrana.get_env_last_rl_step_and_episode(env_id)` returns `(last_rl_step, last_episode)` for an
environment (either may be `None`) — handy for computing explicit steps after a resume.

## Run configuration and attributes

```python
metrana.log_config({"optimizer": {"name": "adam", "lr": 3e-4}, "batch_size": 256})
metrana.set_tags(["baseline", "v2"])      # replace the tag set
metrana.add_tags(["ablation"])            # add without removing
metrana.remove_tags(["baseline"])         # remove
metrana.set_description("LR sweep, seed 0")
```

`config` passed to `init()` is logged the same way (nested dicts/lists flatten under `config/`).
These run-level attributes — along with the git commit SHA and any `tags`/`description` given to
`init()` — are applied **only by the process that creates the run**, so distributed siblings that
resume it never clobber them.

For arbitrary run attributes use `metrana.log_attributes(prefix_path, value)`. For
**per-environment** RL attributes (a distinct, env-scoped kind) use
`metrana.log_env_attributes(env_id, value, episode=None)`.

## Environment renderings

`metrana.log_rendering(frame, rl_step, episode, env_id=None)` appends a frame to a per-`(env_id,
episode)` H.264 `.mp4`, encoded on a dedicated background thread (never blocks the training loop).

- `frame`: a `uint8` NumPy array, `(H, W, 3)` RGB or `(H, W)` / `(H, W, 1)` grayscale. Width and
  height must be even (libx264 `yuv420p`).
- When the `(env_id, episode)` pair changes, the open encoder for that env is closed and a new one
  opened for the next episode.

Configure via `init()`: `rendering_output_dir`, `rendering_fps`, `rendering_max_concurrent_encoders`,
`rendering_queue_max_size`, `skip_drain_render_on_close`, `rendering_close_timeout`. Requires the
`rendering` extra.

## Naming rules

- **Metric names** identify a series together with the scale and labels. Use `/`-delimited prefixes
  to group related series (e.g. `train/loss`, `eval/loss` are distinct series); labels and the
  `evaluation` shorthand are an alternative way to split a name into distinct series.
- **Environment ids** appear in URLs, so they must be URL-safe segments.
- **Config / attribute paths** are `/`-delimited; keys must be non-empty and contain only
  `[a-zA-Z0-9._-:/]`.

## Distributed logging

When several processes (e.g. distributed-training ranks) log into one run, two pieces matter:

**1. They must agree on the run.** Every process that should share a run needs the same
`orchestration_id`. If you don't pass one, it is resolved automatically from
`METRANA_ORCHESTRATION_ID`, then the framework job ids `TORCHELASTIC_RUN_ID` / `SLURM_JOB_ID` /
`RAY_JOB_ID`, then a random token (which only descendants that inherit the environment will match).
The resolved value is published back to `METRANA_ORCHESTRATION_ID` so forked/spawned children
inherit it. With `resume_strategy="never"` (the default), the first process creates the run and the
rest resume it by matching this identifier; a genuinely different job hitting the same run name errors
instead of corrupting it.

```python
# torchrun / Slurm / Ray: nothing to do — the framework job id is picked up automatically.
metrana.init(api_key="...", workspace_name="ws", project_name="proj", run_name="run")

# Custom launcher: pass a token shared by all workers of the job.
metrana.init(..., orchestration_id="job-2025-06-23-abc")
```

**2. Choose the right log function for shared series.** Use `metrana.log_distributed(...)` (instead
of `metrana.log(...)`) when **multiple processes write to the same series** — for example all ranks
logging a global `loss`. It uses unordered, merge semantics so concurrent writers don't conflict.
Provide an explicit `step` (the global training step) so points from different ranks align on the
same axis:

```python
metrana.log_distributed("loss", loss, step=global_step)
```

Use plain `metrana.log(...)` for series owned by a single writer (it is ordered and can
auto-increment). Pin `logger_id` (e.g. one per rank) if you want the backend to distinguish a
restarted writer from a genuinely new concurrent one.

**3. RL metrics need an exclusive owner per environment.** The RL functions
(`metrana.log_rl_episode(...)` and `metrana.log_rl_environment_step(...)`) are **ordered per
`env_id` series**, so a given environment must be logged by exactly one process. When you shard
environments across ranks (e.g. a vectorized env split over workers), make sure each process writes
only its own subset of `env_id`s — two processes logging the same environment race and silently lose
steps/points. Plain `metrana.log(...)` / `metrana.log_distributed(...)` float series have no such
restriction (`log_distributed` is explicitly built for many writers on one series).

## Guarantees and retries

By default Metrana favors **never blocking your training loop** over guaranteeing delivery. Know the
trade-offs:

- **Backpressure** (`backpressure_strategy`, default `"drop_new"`): when the in-process queue is
  full, an enqueue waits up to `enqueue_timeout_secs` (default `0.1`) for room, then:
  - `"drop_new"` — drops the new points (default; protects the loop, can lose data under sustained pressure);
  - `"block"` — waits indefinitely (no loss from queue pressure, but can stall the loop);
  - `"raise"` — raises `MetranaEventQueueFullError`.
- **Retries** (`max_send_retries`, default `60`): failed sends are retried with exponential backoff
  (`send_retry_initial_backoff_secs` → `send_retry_max_backoff_secs`). After the limit the batch is
  dropped. Set `max_send_retries=None` to retry **indefinitely** (no loss from transient outages, at
  the cost of unbounded buffering).
- **Errors** (`error_strategy`, default `"warn"`): how background errors surface — `"silent"`,
  `"warn"`, `"raise_on_log"` (raised on the next log call), or `"raise_on_close"`. Drain the engine's
  sender errors yourself with `metrana.check_sender_errors()`. (Rendering/encoding errors follow the
  same strategy but surface on `log_rendering()` / `close()`.)

**Points can be lost only when:** backpressure is `drop_new` and the queue stays full past the
timeout; or `max_send_retries` is finite and a failure persists past it; or the process exits
without `metrana.close()` (the daemon engine thread is killed with points still queued).

**To prioritize delivery** over loop latency:

```python
metrana.init(
    ...,
    backpressure_strategy="block",   # never drop on queue pressure
    max_send_retries=None,           # retry transient failures forever
    queue_capacity=100_000,          # more headroom before backpressure kicks in
)
# ... and always call metrana.close() (give it a generous close_timeout).
```

## Tuning and observability

- `max_pending_requests` (default `30`): in-flight streaming requests — raise it to push more
  throughput when the backend is the bottleneck.
- `queue_capacity` (default `10_000`): in-process point buffer depth.
- `batch_max_age_secs` (default `1.0`): how long points wait to coalesce into a batch before sending.
- `max_msg_size`: max serialized request size in bytes.

`metrana.get_metrics()` returns a point-in-time snapshot of the engine's self-metrics. For each data
kind (`float_points`, `rl_float_points`, `attribute_updates`, `env_attribute_updates`) it reports how
much was *added* (attempted), *enqueued*, *sent* (server-acked), and *dropped* (shed under
backpressure or after exhausting retries), plus transport/health counters (`connection_attempts`,
`requests_sent`, `send_errors`, `errors_reported`, `errors_evicted`).

Comparing *added* vs *sent* tells you whether anything was lost:

```python
m = metrana.get_metrics()
print(m.float_points_added, m.float_points_sent, m.float_points_dropped)
```

The counters are monotonic, so diff two snapshots over a window to get rates (e.g. for a periodic
health log):

```python
import time

prev = metrana.get_metrics()
time.sleep(10)
now = metrana.get_metrics()
sent_per_sec = (now.float_points_sent - prev.float_points_sent) / 10
backlog = now.float_points_added - now.float_points_sent      # added but not yet acked
if now.float_points_dropped > prev.float_points_dropped:
    print("data loss in the last window — raise queue_capacity / max_pending_requests or retries")
```

## Environment Variables

| Variable                          | Equivalent `init()` argument        |
|-----------------------------------|-------------------------------------|
| `METRANA_API_KEY`                 | `api_key`                           |
| `METRANA_ORCHESTRATION_ID`        | `orchestration_id`                  |
| `METRANA_BACKPRESSURE_STRATEGY`   | `backpressure_strategy`             |
| `METRANA_ERROR_MODES`             | `error_strategy`                    |
| `METRANA_RESUME_STRATEGY`         | `resume_strategy`                   |
| `METRANA_LOG_LEVEL`               | `log_level`                         |
| `METRANA_EVENT_QUEUE_MAX_SIZE`    | `queue_capacity`                    |
| `METRANA_SKIP_DRAIN_RENDER_ON_CLOSE` | `skip_drain_render_on_close`     |
| `METRANA_RENDERING_CLOSE_TIMEOUT` | `rendering_close_timeout`           |
