Metadata-Version: 2.3
Name: syvain-metrics-collector
Version: 0.0.66
Summary: Syvain metrics collection SDK
Requires-Dist: niquests>=3.16.0
Requires-Dist: syvain-metrics-api-client>=0.0.62
Requires-Dist: typing-extensions>=4.15.0
Requires-Dist: pytest>=8.0.0 ; extra == 'dev'
Requires-Dist: ruff>=0.15.12 ; extra == 'dev'
Requires-Dist: ty>=0.0.34 ; extra == 'dev'
Requires-Python: >=3.10
Provides-Extra: dev
Description-Content-Type: text/markdown

# Syvain Metrics Collector

Python SDK for sending experiment metrics and annotations to
[Syvain Metrics](https://metrics.syvain.com/).

Use this package in training, evaluation, and analysis jobs that need one
searchable experiment record with numeric metric series, run metadata, and
human-readable notes.

## Install

```bash
uv add syvain-metrics-collector
```

## Basic Usage

```python
from syvain_metrics_collector import Collector

collector = Collector("ak_org_...")

experiment = collector.experiment(
    "mamba-run-001",
    description="Baseline mamba training run",
    meta={
        "model": "mamba",
        "dataset": "internal-v1",
        "seed": 7,
    },
)

with experiment.run():
    for step in range(1_000):
        loss = 1.0 / (step + 1)
        experiment.metric("loss", loss, step=step, metadata={"split": "train"})

    experiment.annotation(
        "saved checkpoint",
        metadata={"path": "checkpoints/mamba-run-001/step-999.pt"},
    )

experiment.flush_or_raise()
```

The normal shape is:

- create a `Collector` with a metrics API key
- create one `experiment` per run
- put stable run-level facts in `meta`
- send numeric values with `experiment.metric(...)`
- send notable events with `experiment.annotation(...)`
- rely on `experiment.run()` for a best-effort flush when the context exits
- call `flush_or_raise()` before process exit when delivery failure should fail
  the caller

`Collector` defaults to `https://metrics.syvain.com`, so most jobs only need an
API key.

## Experiment Metadata

Use `meta` for facts that apply to the whole run:

```python
experiment = collector.experiment(
    "mamba-run-001",
    meta={
        "model": "mamba",
        "dataset": "internal-v1",
        "git_sha": "abc123",
        "config": {"batch_size": 32, "learning_rate": 0.0003},
    },
)
```

Good experiment metadata includes model name, dataset, seed, git SHA, machine
type, and config values. Do not put per-step values in `meta`; put those on
metrics.

## Metrics

Metric values must be finite numbers. `step` is required by the Python method;
pass `step=None` only for events that genuinely have no step.

```python
experiment.metric("validation_loss", 0.182, step=500)
```

Use the same metric name for the same measured quantity:

```python
experiment.metric("loss", train_loss, step=step, metadata={"split": "train"})
experiment.metric("loss", val_loss, step=step, metadata={"split": "validation"})
```

Use separate metric names when the quantity or unit is different:

```python
experiment.metric("loss", 0.42, step=step, metadata={"split": "train"})
experiment.metric("accuracy", 0.91, step=step, metadata={"split": "validation"})
experiment.metric("tokens_per_second", 1820.0, step=step)
```

## Metric Metadata

Metric metadata is how the dashboard separates related lines inside one metric.
Keep it low-cardinality and easy to group:

```python
experiment.metric(
    "gpu_utilization",
    78.0,
    step=step,
    metadata={"device": "gpu:0"},
)
experiment.metric(
    "gpu_utilization",
    74.0,
    step=step,
    metadata={"device": "gpu:1"},
)
```

Useful metadata keys include `split`, `device`, `rank`, `phase`, and
`prompt_set`.

Avoid request IDs, timestamps, constantly changing file paths, and large nested
payloads on metrics. Put one-off details in annotations instead.

## Annotations

Use annotations for text events that explain the run:

```python
experiment.annotation(
    "evaluation started",
    metadata={"split": "validation"},
)
```

Common annotations include checkpoints, phase changes, incidents, artifact
paths, dashboard links, and manual operator notes.

## Folders

If you know the folder ID, pass it directly:

```python
experiment = collector.experiment(
    "mamba-run-001",
    folder_id="00000000-0000-0000-0000-000000000000",
)
```

If you only know the dashboard path, pass `folder_path`:

```python
experiment = collector.experiment(
    "mamba-run-001",
    folder_path="/mamba-run-001",
)
```

Do not pass both. `folder_path` makes an extra API request to resolve the path
to a folder ID and raises if the path is missing or ambiguous.

## Flushing and Errors

Metric and annotation calls enqueue data locally and return quickly. The SDK
flushes batches in the background after a short delay.

HTTP transport is delegated to `syvain-metrics-api-client`. Write requests use
the API client's idempotency keys and built-in retry policy, and metric payloads
receive a stable `client_event_id` before they enter the local queue so retried
flushes keep the same event identity.

The `experiment.run()` context manager calls `done()` and then attempts a
best-effort flush when the context exits. It retries three times by default and
logs a warning if delivery is still incomplete. It does not raise on flush
failure, drop queued data, or consume the retry budget used by later explicit
flush calls. Exceptions from the training block still propagate.

Use `flush()` when you want one non-raising flush attempt and a status object:

```python
result = experiment.flush()
if not result.ok:
    print(result.pending_metrics, result.retryable_failures)
```

Use `flush_or_raise()` when the caller should fail if any metric, annotation, or
status update is still pending or failed. `retries` is the number of retry
attempts after the first flush attempt:

```python
experiment.flush_or_raise()
experiment.flush_or_raise(retries=5)
```

Experiment creation is required state and raises on failure. After an experiment
exists, metric and annotation delivery is best effort unless you call
`flush_or_raise()`. Explicit `start()` and `done()` lifecycle calls do not auto
flush; call `flush()` or `flush_or_raise()` after `done()`.

## Manual Lifecycle

The context manager is enough for most jobs:

```python
with experiment.run():
    experiment.metric("loss", 0.5, step=0)
```

Use explicit lifecycle calls when the run does not fit a single `with` block:

```python
experiment.start()

for step in range(1_000):
    experiment.metric("loss", 1.0 / (step + 1), step=step)

experiment.done()
experiment.flush_or_raise()
```

If another supervisor owns process exit and exception handling, disable the
SDK's process hooks:

```python
with experiment.run(install_hooks=False):
    experiment.metric("loss", 0.5, step=0)
```

## Local and Test Collectors

Use `JsonlCollector` when you want the same API shape but local JSONL output:

```python
from pathlib import Path

from syvain_metrics_collector import JsonlCollector

collector = JsonlCollector(path=Path("artifacts/metrics/run-001.jsonl"))
experiment = collector.experiment("run-001", meta={"model": "mamba"})

with experiment.run():
    experiment.metric("loss", 0.42, step=1, metadata={"split": "train"})
    experiment.annotation("local checkpoint written", metadata={"path": "ckpt.pt"})

experiment.flush_or_raise()
```

Use `NoopCollector` in tests or dry runs that should accept metrics calls
without network or file IO:

```python
from syvain_metrics_collector import NoopCollector

collector = NoopCollector()
experiment = collector.experiment("unit-test-run")

with experiment.run():
    experiment.metric("loss", 0.42, step=1)
```

## Constructor Options

```python
collector = Collector(
    "ak_org_...",
    host="https://metrics.syvain.com",
    timeout=10.0,
    ingest_timeout=60.0,
    flush_delay_seconds=0.25,
    max_queue_items=100_000,
    max_batch_items=500,
    max_retries=None,
)
```

- `timeout`: experiment creation and status update timeout
- `ingest_timeout`: metric and annotation batch timeout
- `flush_delay_seconds`: background batching delay
- `max_queue_items`: maximum queued metrics plus annotations per experiment
- `max_batch_items`: maximum items in one ingest request
- `max_retries`: retry limit for retryable delivery failures; `None` retries
  indefinitely while respecting the queue limit

`timestamp` can be passed to `metric(...)` as seconds or milliseconds. Values
below `10_000_000_000` are interpreted as seconds and converted to
milliseconds.
