Metadata-Version: 2.4
Name: tcm-client
Version: 2.0.0
Summary: Official Python WebSocket client for Triton Client Manager.
Author-email: Adrián Morillas Pérez <adrianmorillasperez@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/adrirubim/triton_client_manager
Project-URL: Repository, https://github.com/adrirubim/triton_client_manager
Project-URL: Issues, https://github.com/adrirubim/triton_client_manager/issues
Keywords: triton,websocket,orchestrator,client
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: websockets<17,>=16
Requires-Dist: numpy>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: onnx>=1.21.0

# `tcm-client` SDK (WebSocket) — v2.0.0-GOLDEN ready

This SDK is the supported client for the Triton Client Manager WebSocket API.

It provides a typed, ergonomic interface for:
- authentication (`auth`)
- operational queries (`info.*`)
- management jobs (`management.*`)
- inference requests (`inference`)

## Connection

- WebSocket endpoint: `ws://<host>:<port>/ws`
- Health endpoints:
  - `GET /health` (liveness)
  - `GET /ready` (readiness)

`GET /ready` may return `503` with a sanitized payload if core dependencies are not healthy (or if the
probe itself fails). In that case, use `error_id` to correlate server logs:

```json
{
  "status": "not_ready",
  "reason": "readiness_probe_failed",
  "detail": "internal_error",
  "error_id": "..."
}
```

## Message envelope (wire contract)

All messages share the same top-level envelope:

```json
{
  "uuid": "client-uuid",
  "type": "auth|info|management|inference",
  "payload": {}
}
```

The server may also emit:

- `type="error"` for system-level conditions (including shutdown)

## Inference request payload (wire contract)

For `type="inference"`, the payload must include:

- `vm_ip` (string) — required (routing target)
- `container_id` (string) — required (routing target)
- `model_name` (string) — required
- `request.inputs` (list) — required

### Tensor inputs (JSON path)

The manager accepts two equivalent shapes in `payload.request.inputs[*]`:

- **SDK-friendly**: `{name, shape, datatype, data}`
- **Manager/internal**: `{name, dims, type, value}`

### SHM inputs (zero‑copy metadata)

For SHM, each input item is an `SHMReference` dict:

- `{name, shm_key, offset, byte_size, shape, dtype}`

Notes:
- SHM is currently supported only for HTTP inference.
- SHM is rejected for gRPC streaming requests.

## Error handling model (v2.0.0-GOLDEN)

The manager has two main error shapes you must handle:

### A) System-level errors (`type="error"`)
These represent conditions where the manager cannot or will not process work.

#### `SYSTEM_SHUTDOWN`
During shutdown draining (SIGTERM / deployment restarts), the manager explicitly NACKs queued/in-flight work:

```json
{
  "type": "error",
  "payload": {
    "code": "SYSTEM_SHUTDOWN",
    "message": "Manager is shutting down"
  }
}
```

**Client guidance**
- Treat as a stop-the-world signal: do not retry immediately.
- Close the socket and reconnect with backoff.
- Resume work only after `GET /ready` returns ready again.

Operational detail:
- The manager enforces a **hard 2.0s SIGTERM deadline** for draining (best-effort). Plan for NACKs under deploy restarts.

### B) Inference job failures (`type="inference"`, `payload.status="FAILED"`)
Inference responses always come back as:

```json
{
  "type": "inference",
  "uuid": "client-uuid",
  "payload": {
    "status": "COMPLETED|FAILED",
    "model_name": "my-model",
    "data": {}
  }
}
```

When `status="FAILED"`, `payload.data` may be either:

1) A **typed Triton-facing error object** (recommended contract):

```json
{
  "code": "TRITON_TIMEOUT",
  "message": "[TritonThread] TRITON_TIMEOUT: model='my-model' retriable=True reason=Timeout",
  "retriable": true,
  "retry_after_seconds": 2
}
```

2) A **string** for validation/contract errors (missing fields, unknown container, etc.):

```json
"Missing required field 'vm_ip'"
```

**Client guidance**
- If `payload.data` is an object:
  - Use `code` + `retriable` to implement retry policy (do not parse `message`).
  - `TRITON_TIMEOUT` is retriable: retry with exponential backoff + jitter.
  - If `retry_after_seconds` is present, respect it.
- If `payload.data` is a string:
  - Treat as a client-side contract error (fix request formation).

## Admission Control (413 Payload Too Large)

If the manager is configured with a payload budget (e.g. `TCM_MAX_REQUEST_PAYLOAD_MB>0`),
requests that exceed the estimated decoded payload limit fail fast with an error reason containing:

`413 Payload Too Large`

Example failure reason:

```json
{
  "code": "TRITON_INFERENCE_FAILED",
  "message": "[TritonThread] TRITON_INFERENCE_FAILED: model='my-model' retriable=False reason=413 Payload Too Large: estimated_bytes=... limit_bytes=...",
  "retriable": false
}
```

**Client guidance**
- This is not retriable as-is. Reduce tensor dimensions / datatype size.

---

## Zero‑Copy Shared Memory (POSIX System SHM)

For large tensors, the recommended v2.0.0-GOLDEN path is to avoid sending tensor bytes over WebSocket JSON and instead:

- Write the tensor into POSIX shared memory (e.g. `/dev/shm`)
- Send an `SHMReference` object as the inference input payload (metadata only)

### Capability negotiation

During `auth`, clients may request SHM support:

```json
{
  "uuid": "client-uuid",
  "type": "auth",
  "payload": {
    "capability": ["json", "shm"]
  }
}
```

- If the environment supports SHM, the manager replies with `auth.ok` and `payload.capability` including `"shm"`.
- If the client does not send `capability`, the manager replies with the legacy shape `{"type":"auth.ok"}` (no `payload`) to avoid breaking older clients.

### `SHMReference` shape

Send SHM inputs in `payload.request.inputs`:

```json
{
  "name": "INPUT__0",
  "shm_key": "/tcm_demo_input0",
  "offset": 0,
  "byte_size": 602112,
  "shape": [1, 3, 224, 224],
  "dtype": "FP32"
}
```

### SHM error codes

- `TRITON_SHM_UNAVAILABLE` (fatal): SHM not supported or shm key missing/inaccessible.
- `TRITON_SHM_REGISTRATION_FAILED` (fatal): SHM registration failed on the Triton side.

## Recommended retry policy (high-level)

- **System errors**
  - `SYSTEM_SHUTDOWN`: reconnect with backoff; wait for readiness
- **Retriable Triton errors** (`retriable=true`)
  - `TRITON_TIMEOUT`, `TRITON_NETWORK`, `TRITON_OVERLOADED`, `TRITON_CIRCUIT_OPEN`
  - retry with exponential backoff + jitter; cap max attempts
- **Fatal Triton errors** (`retriable=false`)
  - do not retry; fix request or intervene operationally (model/shape/config)

## Install

```bash
python -m pip install --upgrade pip
python -m pip install tcm-client
```

## Minimal usage example (Python)

```python
import asyncio

from tcm_client import AuthContext, TcmWebSocketClient


async def main() -> None:
    uri = "ws://127.0.0.1:8000/ws"

    ctx = AuthContext(
        uuid="client-1",
        token="opaque-or-jwt-token",
        sub="user-123",
        tenant_id="tenant-abc",
        roles=["inference"],
    )

    async with TcmWebSocketClient(uri, ctx) as client:
        await client.auth()

        # Example: call your inference helper (depends on SDK surface)
        # resp = await client.infer_http(...)
        # if resp.status == "FAILED": handle as described above


if __name__ == "__main__":
    asyncio.run(main())
```

## CLI

```bash
tcm-client-cli --uri "ws://127.0.0.1:8000/ws" queue-stats
```

