Metadata-Version: 2.4
Name: gen-worker
Version: 0.8.3
Summary: A library used to build custom functions in Cozy Creator's serverless function platform.
Project-URL: Homepage, https://github.com/cozy-creator/python-gen-worker
Project-URL: Repository, https://github.com/cozy-creator/python-gen-worker
Project-URL: Issues, https://github.com/cozy-creator/python-gen-worker/issues
Author-email: Paul Fidika <paul@fidika.com>
License-Expression: MIT
License-File: LICENSE
Keywords: ai,cozy,inference,ml,serverless
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: backoff>=2.2.1
Requires-Dist: blake3>=1.0.0
Requires-Dist: boto3>=1.41.0
Requires-Dist: grpcio>=1.80.0
Requires-Dist: huggingface-hub>=0.26.0
Requires-Dist: msgspec>=0.18.6
Requires-Dist: numpy>=1.24.0
Requires-Dist: pillow>=9.0.0
Requires-Dist: protobuf>=6.30.0
Requires-Dist: psutil>=7.0.0
Requires-Dist: pyjwt[crypto]>=2.8.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: requests>=2.32.0
Requires-Dist: tomli-w>=1.2.0
Provides-Extra: audio
Requires-Dist: numpy>=1.24; extra == 'audio'
Requires-Dist: soundfile>=0.12; extra == 'audio'
Provides-Extra: dev
Requires-Dist: grpcio-tools>=1.80.0; extra == 'dev'
Requires-Dist: mypy>=1.10.0; extra == 'dev'
Requires-Dist: pytest>=9.0.0; extra == 'dev'
Requires-Dist: types-pyyaml>=6.0.12.20250915; extra == 'dev'
Requires-Dist: types-requests>=2.32.4.20250913; extra == 'dev'
Provides-Extra: images
Requires-Dist: pillow>=10.0; extra == 'images'
Provides-Extra: torch
Requires-Dist: flashpack>=0.2.1; extra == 'torch'
Requires-Dist: safetensors>=0.7.0; extra == 'torch'
Requires-Dist: torch>=2.11.0; extra == 'torch'
Provides-Extra: trainer
Requires-Dist: pyarrow>=17.0.0; extra == 'trainer'
Provides-Extra: vision
Requires-Dist: torchvision>=0.26.0; extra == 'vision'
Description-Content-Type: text/markdown

# gen-worker

Python SDK for writing **endpoints** that run on Cozy's worker pool. You write a
decorated function, the SDK handles discovery, scheduling, model loading,
cancellation, file I/O, streaming, and reporting back to the control plane.

Three endpoint kinds:

- **Inference** — request/response, optionally streaming.
- **Training** — long-running, stateful, periodic checkpoints.
- **Conversion** — produces weight artifacts on a destination repo.

## Install

```bash
pip install gen-worker[torch]   # for PyTorch inference/training
pip install gen-worker[vision]  # add torchvision for image/video models
pip install gen-worker          # plain Python (e.g. API-proxy endpoints)
```

Optional extras: `[images]` for `gw.io.read_image / write_image`,
`[audio]` for `gw.io.read_audio`, `[trainer]` for trainer-class endpoints.

## Minimum viable endpoint

Two files when deploying through Tensorhub's generated-Dockerfile path.
Tensorhub generates the Dockerfile when `endpoint.toml` has build hints,
installs your dependencies, runs discovery, and wires the runtime entrypoint.

**`endpoint.toml`**:

```toml
schema_version = 1
main = "myendpoint.main"

[[build.profiles]]
name = "default"
accelerator = "none"
python = "3.12"
dependencies = ["gen-worker>=0.7.5", "msgspec"]
```

**`main.py`**:

```python
import msgspec
from gen_worker import RequestContext, inference_function

class Input(msgspec.Struct):
    prompt: str

class Output(msgspec.Struct):
    text: str

@inference_function
def run(ctx: RequestContext, payload: Input) -> Output:
    return Output(text=f"got: {payload.prompt}")
```

That's it. `cozyctl endpoint deploy` (or the platform UI) takes it from here.
For custom base images, multi-stage builds, or non-pip setup, add a Dockerfile;
Tensorhub will use it instead of generating one.

## Adding a model

Declare model dependencies on the decorator's `models={...}` kwarg. The worker
loads and caches each binding; your function receives the live instance.

```python
from diffusers import StableDiffusionXLPipeline
from gen_worker import Repo, Resources, inference_function

sdxl = Repo("base_model", "stabilityai/stable-diffusion-xl-base-1.0")

@inference_function(
    resources=Resources(requires_gpu=True, min_vram_gb=12.0),
    models={"pipe": sdxl.flavor("bf16")},
)
def generate(ctx, pipe: StableDiffusionXLPipeline, payload: Input) -> Output:
    images = pipe(payload.prompt).images
    return Output(image=gw_io.write_image(ctx, "out", images[0]))
```

`Resources` is the per-function hardware envelope plus dynamic cost shape (used
by the orchestrator for placement and admission). `Repo(name, default_ref)` is
the binding. The `name` is the stable model-slot config key Tensorhub can update
after publish; `default_ref` is only the initial/default repo ref. The old
`Repo(ref)` / `HFRepo(ref)` / `CivitaiRepo(ref)` shape still works for existing
endpoints and uses the model parameter name as the slot key when discovered.

## Three binding shapes

**Fixed pick** — function pins one specific `(repo, flavor?, tag?)`:

```python
models={"pipe": Repo("base_model", "acme/flux").flavor("bf16")}
```

**Dispatch pick** — payload-driven, keyed by a `Literal[...]`-typed field:

```python
from typing import Literal

class Input(msgspec.Struct):
    variant: Literal["nf4", "int8"]
    prompt: str

@inference_function(
    resources=Resources(requires_gpu=True, min_vram_gb=14.0),
    models={"pipe": dispatch(
        field="variant",
        table={
            "nf4":  flux.flavor("nf4"),
            "int8": flux.flavor("int8"),
        },
    )},
)
def generate(ctx, pipe, payload: Input) -> Output: ...
```

**Override-allowed** — caller may substitute the default, subject to a
pipeline-class allowlist the tenant declares:

```python
models={"pipe": flux.flavor("bf16").allow_override(StableDiffusionXLPipeline)}
```

The caller then sends `{"prompt": "...", "_models": {"pipe": "acme/my-finetune:prod#bf16"}}`
to substitute. Class mismatch → request rejected before dispatch.

## Public surface

Top-level `gen_worker` exports only what endpoint authors need:

- Decorators + bindings: `inference_function`, `Resources`, `Repo`, `Dispatch`, `dispatch`
- Context types: `RequestContext`, `ConversionContext`, `DatasetContext`, `TrainingContext`
- Value types: `Asset`, `ImageAsset`, `VideoAsset`, `AudioAsset`, `MediaAsset`, `Tensors`, `Compute`
- Errors: `ValidationError`, `RetryableError`, `FatalError`, `ResourceError`,
  `AuthError`, `CanceledError`, `OutputTooLargeError`, `InputTooLargeError`,
  `WorkerError`
- Helpers: `Clamp`, `iter_transformers_text_deltas`, `load_loras`,
  `apply_low_vram_config`, `with_oom_retry`
- I/O codecs: `gen_worker.io` (`read_image`, `read_audio`, `write_image`,
  `read_bytes`, `open`, `exists`)

Training and conversion live in their own submodules: `gen_worker.trainer`,
`gen_worker.conversion`, `gen_worker.clone`.

## Local development

`gen-worker run` executes one endpoint method in the local Python
interpreter against a JSON payload — no docker-compose, no orchestrator.

```bash
pip install -e .
gen-worker run --payload '{"prompt": "hello"}'
```

stdout for results, stderr for events; exit 0 / 1 / 2 / 3 / 130 for
success / user-exception / usage / model-resolution / SIGINT. Full
two-input model, the three CLI shapes (`run` / `serve` + `invoke` /
`repl`), ergonomic `field=value` args, `--offline` story, SIGINT
semantics, and worked examples in [docs/local-dev.md](docs/local-dev.md).
The machine-readable host-integration contract (versioning, `describe
--json`, the NDJSON protocol, the serve sidecar) lives in
[docs/host-integration.md](docs/host-integration.md).

### Running tests

`pytest` lives in the `dev` optional-dependency extra, so the supported
command is:

```bash
uv run --extra dev pytest
```

Plain `uv run pytest` would fall through to a global launcher — always pass
`--extra dev`. **Never `pip install` gen-worker globally:** a stale
`~/.local` install silently shadows the working tree (`tests/conftest.py`
hard-fails if `gen_worker` resolves outside `src/`).

## Documentation

- [docs/endpoint-authoring.md](docs/endpoint-authoring.md) — full reference: the
  three layers, `Resources`, bindings, `dispatch`, `allow_override`,
  multi-param injection, the `_models` envelope, atomic substitution.
- [docs/local-dev.md](docs/local-dev.md) — `gen-worker run` CLI: two-input
  invocation model, `--offline` story, SIGINT semantics, exit codes,
  worked examples.
- [docs/endpoint-toml.md](docs/endpoint-toml.md) — `endpoint.toml` reference:
  build modes, placement fields, build hints, `BASE_IMAGE` injection.
- [docs/dockerfile.md](docs/dockerfile.md) — when to provide your own
  Dockerfile, the three Dockerfile contract points, when `ARG BASE_IMAGE`
  matters, multi-profile builds.
- [docs/scaling-hints.md](docs/scaling-hints.md) — `Resources` cost-shape
  fields used by the orchestrator for admission and scheduling.
- [docs/endpoint-envs.md](docs/endpoint-envs.md) — tenant-defined envs/secrets
  attached to a deployed endpoint at runtime.

## Examples

Working endpoints to copy from in `examples/`:

- `marco-polo/` — minimal inference endpoint
- `training-smoke/` — minimal trainer
- `from-scratch/` — boilerplate template
