Metadata-Version: 2.4
Name: sunwaee
Version: 1.7.8
Summary: SUNWÆE gen — multi-provider LLM engine library.
Author: David NAISSE
Maintainer: David NAISSE
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.27.0
Provides-Extra: files
Requires-Dist: pypdf>=4.0.0; extra == "files"
Requires-Dist: python-docx>=1.1.0; extra == "files"
Requires-Dist: openpyxl>=3.1.0; extra == "files"
Requires-Dist: python-pptx>=1.0.0; extra == "files"
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=1.0.0; extra == "dev"
Requires-Dist: pytest-cov>=5.0.0; extra == "dev"
Requires-Dist: setuptools_scm>=8; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: build; extra == "dev"
Dynamic: license-file

![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen) ![Python](https://img.shields.io/badge/python-3.11%2B-blue) ![PyPI](https://img.shields.io/pypi/v/sunwaee) ![License](https://img.shields.io/badge/license-MIT-blue)

All LLMs, one response format, one dependency (`httpx`). Supports switching providers mid-conversation.

Handles streaming, tool calls, file attachments, prompt caching, per-model reasoning effort, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.

---

## Install

```bash
pip install sunwaee
pip install "sunwaee[files]"   # adds pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]"  # development
```

---

## Quick start

```python
import asyncio
from sunwaee.modules.gen.engine.factory import get_engine
from sunwaee.modules.gen.engine.types import Message, Role

engine = get_engine("anthropic", "claude-sonnet-4-6")
# or with explicit reasoning effort:
engine = get_engine("anthropic", "claude-sonnet-4-6", reasoning_effort="high")

async def main():
    messages = [Message(role=Role.USER, content="Hello")]

    # non-streaming
    response = await engine.chat(messages)
    print(response.content, response.cost.total)

    # streaming
    async for chunk in engine.stream(messages):
        if chunk.content:
            print(chunk.content, end="", flush=True)

asyncio.run(main())
```

---

## Providers

| Provider  | `provider=`   | Env var             |
| --------- | ------------- | ------------------- |
| Anthropic | `"anthropic"` | `ANTHROPIC_API_KEY` |
| OpenAI    | `"openai"`    | `OPENAI_API_KEY`    |
| Google    | `"google"`    | `GOOGLE_API_KEY`    |
| DeepSeek  | `"deepseek"`  | `DEEPSEEK_API_KEY`  |
| xAI       | `"xai"`       | `XAI_API_KEY`       |
| Moonshot  | `"moonshot"`  | `MOONSHOT_API_KEY`  |

API key falls back to the env var when `api_key=` is not passed.

---

## Directory structure

```
sunwaee/
├── utils/
│   └── logger.py                 # get_logger(name) — scoped under "sunwaee.*"
└── modules/gen/
    └── engine/
        ├── base.py               # BaseEngine ABC — chat() + stream()
        ├── factory.py            # get_engine(), close_all_clients(), connection pool
        ├── model.py              # Model dataclass + compute_cost()
        ├── types.py              # Message, Response, Tool, ToolCall, Usage, Cost, Performance, FileAttachment
        ├── errors.py             # EngineError hierarchy
        ├── models/               # per-provider model registries
        │   ├── __init__.py       # get_model(), list_models()
        │   └── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
        └── providers/
            ├── anthropic.py      # AnthropicEngine
            ├── completions.py    # CompletionsEngine  (/v1/chat/completions)
            ├── responses.py      # ResponsesEngine    (/v1/responses)
            └── google.py         # GoogleEngine

tests/gen/
└── engine/
    ├── test_types.py / test_factory.py / test_model.py / test_errors.py
    ├── providers/
    │   └── test_anthropic.py / test_completions.py / test_responses.py / test_google.py
    └── live/                     # real API calls, excluded from CI (-m live)
        ├── _shared.py            # engines, fixtures, system prompt shared across files
        ├── test_scenarios.py     # all providers x scenarios x chat + stream
        ├── test_tool_call_result.py
        ├── test_attachments.py
        ├── test_chain.py         # three-provider conversation chain with shared history
        ├── test_caching.py
        └── test_reasoning.py
```

---

## Core types

All types are defined in `engine/types.py`. Key ones:

**`Message`** — one turn in a conversation. `role` is a `Role` enum (`SYSTEM`, `USER`, `ASSISTANT`, `TOOL`, `CONTEXT`). `attachments` only applies to `Role.USER`. `reasoning_content` / `reasoning_signature` are provider-opaque — echo them back verbatim.

**`Response`** — what `chat()` returns and what `stream()` yields per chunk. Text arrives in `content`; reasoning in `reasoning_content`. The final streaming chunk carries `stop_reason`, `usage`, `cost`, and `performance`. Chunks with `synthetic=True` are engine-generated stubs (e.g. silent-reasoning placeholder) — never treat them as real model output.

**`Tool`** — a function the model can call. `name`, `description`, and `parameters` (JSON Schema object) are sent to the provider. The optional `fn` field is not used by the engine itself.

**`FileAttachment`** — wraps `bytes` + `filename`. Supported types: `text/*`, `application/json`, images (`jpeg/png/gif/webp`), and documents (`pdf/docx/xlsx/pptx`, requires `[files]` extra). Size caps enforced at construction: 10 MB for images, 20 MB for documents. See `types.py` for the full list of accepted MIME types.

**`Usage` / `Cost` / `Performance`** — token counts, dollar cost, and timing (latency, throughput, reasoning vs content split). Field names are in `types.py`.

---

## `get_engine()`

```python
from sunwaee.modules.gen.engine.factory import get_engine, close_all_clients

engine = get_engine(
    provider,           # "anthropic" | "openai" | "google" | "deepseek" | "xai" | "moonshot"
    model,              # model name string
    api_key=None,       # falls back to <PROVIDER>_API_KEY env var
    max_tokens=8192,
    reasoning_effort=None,  # None | "off" | "auto" | any value in model.reasoning_efforts
)

# call once on graceful shutdown to drain all pooled connections
await close_all_clients()
```

`get_engine()` reuses a single `httpx.AsyncClient` per `(event_loop, base_url)` (WeakKeyDictionary — dead loops drop their clients automatically). See `factory.py` for timeout and pool limits.

### Resolution order

1. **Effort coercion** — `reasoning_effort=None` on a dynamic model that lists `"off"` is coerced to `"off"` (e.g. kimi-k2.5; coercion merges `reasoning_disabled_payload` to disable thinking). Models that use `"none"` as the disable wire value (gpt-5.x) do not coerce — `None` leaves the reasoning block absent, which lets the model use its default.
2. **Wire-model swap** — for `reasoning_mode="dynamic"` models: `effort in (None, "off")` swaps to `non_reasoning_id`; any other effort swaps to `reasoning_id`. No swap occurs when the target variant is not defined.
3. **Validation** — non-null effort must appear in `model.reasoning_efforts` (raises `ValueError`).
4. **Routing** — OpenAI-compat: `"responses" in model.api_type` -> `ResponsesEngine`, else `CompletionsEngine`. Anthropic -> `AnthropicEngine`. Google -> `GoogleEngine`.

---

## `Model` dataclass

Defined in `engine/model.py`. Reasoning-relevant fields:

| Field                               | Meaning                                                                                                                                                                           |
| ----------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `reasoning_mode`                    | `"always"` / `"dynamic"` / `None`                                                                                                                                                 |
| `reasoning_efforts`                 | valid effort strings; `"always"` models have no `"off"`; `"dynamic"` models that disable via model swap start with `"off"`; OpenAI gpt-5.x use `"none"` as the wire disable value |
| `reasoning_uses_budget`             | `True` = factory maps effort strings to integer token budgets (Anthropic 4.5, Gemini 2.5 flash)                                                                                   |
| `reasoning_tokens_type`             | `"raw"` / `"summary"` / `None` (silent -- engine emits a synthetic stub)                                                                                                          |
| `reasoning_disabled_payload`        | merged into request when reasoning is explicitly disabled                                                                                                                         |
| `reasoning_id` / `non_reasoning_id` | paired variant names for model swapping                                                                                                                                           |
| `api_type`                          | `["responses"]` / `["completions"]` / both -- routing hint for OpenAI-compat providers                                                                                            |

Pricing fields and the full field list are in `engine/model.py`.

---

## Usage

### Tool calls

Construct `Tool` objects with a JSON Schema `parameters` dict and pass them to `chat()` / `stream()`:

```python
from sunwaee.modules.gen.engine.types import Tool

weather_tool = Tool(
    name="get_weather",
    description="Return current weather for a location.",
    parameters={
        "type": "object",
        "properties": {
            "location": {"type": "string", "description": "City name"},
        },
        "required": ["location"],
    },
)

response = await engine.chat(messages, tools=[weather_tool])
if response.tool_calls:
    for tc in response.tool_calls:
        print(tc.name, tc.arguments)
```

### File attachments

```python
from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role

with open("report.pdf", "rb") as f:
    att = FileAttachment(data=f.read(), filename="report.pdf")

response = await engine.chat([
    Message(role=Role.USER, content="Summarise this.", attachments=[att])
])
```

### Error handling

All provider errors subclass `EngineError(RuntimeError)`. Import from `engine/errors.py`:

```python
from sunwaee.modules.gen.engine.errors import EngineError, RateLimitError, AuthError, TransientError

try:
    response = await engine.chat(messages)
except RateLimitError:   # 429
    ...
except AuthError:        # 401 / 403
    ...
except TransientError:   # 5xx
    ...
except EngineError as e:
    print(e.status_code)
```

### Listing models

```python
from sunwaee.modules.gen.engine.models import list_models, get_model

all_models = list_models()              # list[Model]
model = get_model("claude-sonnet-4-6")  # Model | None
```

---

## Logging

Set `SUNWAEE_LOG_LEVEL=debug` (or `info` / `warning` / `error`) to enable logs. All engine logs are at `DEBUG` -- request start/completion, model resolution decisions. See `utils/logger.py`.

---

## Testing

```bash
venv/bin/pytest                 # unit tests (no API keys needed)
venv/bin/pytest -m live         # live tests (real API calls)
venv/bin/pytest -m "not live"   # explicit unit-only
```

Unit test conventions: mock `httpx.AsyncClient`, never make real HTTP calls. Assert `usage`, `cost`, and `performance` are populated on the final streaming chunk.

Live test files:

| File                       | What it covers                                        |
| -------------------------- | ----------------------------------------------------- |
| `test_scenarios.py`        | all providers x scenarios x `chat()` + `stream()`     |
| `test_tool_call_result.py` | full tool call -> execute -> reply loop               |
| `test_attachments.py`      | image attachments, vision-capable providers           |
| `test_chain.py`            | three-provider conversation chain with shared history |
| `test_caching.py`          | prompt-cache hit on turn 2                            |
| `test_reasoning.py`        | `reasoning_effort` on/off per model category          |

---

## How to add a model

Add a `Model(...)` entry to `engine/models/<provider>.py` and ensure it is included in that file's `MODELS` list (imported by `engine/models/__init__.py`). Field reference is in `engine/model.py`. Then run `psql/scripts/sync_models.py` to mirror the change to the database.

Key rules:

- `reasoning_mode="dynamic"` models that disable reasoning by swapping to a non-reasoning variant list `"off"` first in `reasoning_efforts` (e.g. kimi-k2.5). OpenAI gpt-5.x models that disable reasoning via `{"reasoning": {"effort": "none"}}` on the same model list `"none"` first instead — do NOT use `"off"` for these.
- `reasoning_uses_budget=True` only for Anthropic 4.5 series and Gemini 2.5 flash/flash-lite.
- `api_type=["responses", "completions"]` for OpenAI models that support both endpoints; `["completions"]` for OpenAI-compat providers (xAI, DeepSeek, Moonshot). Omit for Anthropic and Google.
- Pricing tiers: base required; `_200k` when context > 200k tokens; `_128k` for xAI; `_272k` for OpenAI. Thresholds are strict `>`.

---

## How to add an OpenAI-compatible provider

1. `engine/models/<provider>.py` -- `MODELS` list.
2. `engine/models/__init__.py` -- import and add to `_ALL`.
3. `engine/factory.py` -- add to `_OPENAI_COMPATIBLE` dict (`"provider": "https://base-url/v1"`). The env var is derived automatically as `PROVIDER_API_KEY`.
4. `tests/gen/engine/live/_shared.py` -- add `("provider", "model-name")` to `ENGINES`.

---

## How to add a provider with a custom API

1. `engine/models/<provider>.py` + register in `__init__.py`.
2. `engine/providers/<provider>.py` -- implement `BaseEngine`:
   - `async def chat(messages, tools=None) -> Response`
   - `async def stream(messages, tools=None) -> AsyncIterator[Response]`
   - Accept `client: httpx.AsyncClient | None = None` -- `factory.py` injects a pooled client.
   - Call `resolve_tokens()` before `compute_cost()` -- some providers exclude reasoning tokens from `output_tokens`.
   - Strip `reasoning_content` / `reasoning_signature` from all but the last assistant turn.
   - Promote system-only input to `Role.USER` if the provider rejects system-only requests.
   - On 4xx/5xx during streaming: read the full body before raising.
   - Buffer tool call JSON across SSE chunks; parse only on stop.
3. `engine/factory.py` -- wire into `get_engine()`.
4. Tests: unit (`providers/test_<provider>.py`) + live entry in `_shared.py`.

---

## Provider-specific notes

- **`resolve_tokens()` before `compute_cost()`** -- xAI and Google exclude reasoning tokens from `output_tokens`; `resolve_tokens` back-calculates from `total_tokens`.
- **Strip reasoning from all but the last assistant turn** -- stale `reasoning_signature` breaks APIs.
- **OpenAI uses `max_completion_tokens`**, not `max_tokens` (CompletionsEngine); `max_output_tokens` for ResponsesEngine.
- **Silent-reasoning models** (`grok-4`, `grok-4-1-fast`, `grok-4-fast`, `grok-3-mini`) -- stream is silent during thinking; engines yield a synthetic `Response(reasoning_content="Reasoning in progress...", synthetic=True)` immediately.
- **Google: no tool call IDs** -- function name used as correlation ID. `thoughtSignature` on `functionCall` parts must be echoed back on every subsequent assistant turn.
- **Google streaming** -- `?alt=sse` required on `streamGenerateContent`.
- **Anthropic reasoning: two paths** -- newer models (Opus 4.7/4.6, Sonnet 4.6) use `output_config: {effort}` + `thinking: {type: "adaptive"}`; older budget models use `thinking: {type: "enabled", budget_tokens: N}`. Selected via `model.reasoning_uses_budget`.
- **Anthropic top-level `cache_control`** -- `payload["cache_control"] = {"type": "ephemeral"}` at request root enables auto-caching. Do not remove.
- **Foreign `reasoning_signature` detection** -- Anthropic and Google drop signatures that start with `[` (ResponsesEngine JSON list format). Echoing them causes base64 decode failures.
- **OpenAI ResponsesEngine caching** -- the Responses API does not do automatic prefix caching by server-side routing without a hint. `ResponsesEngine` computes a `prompt_cache_key` as SHA-256[:32] of the system prompt content and sends it with every request. This pins all requests sharing the same system prompt to the same cache server, enabling prefix-cache hits without `previous_response_id` or `store=True`.
- **DeepSeek cache tokens** -- DeepSeek exposes `prompt_cache_hit_tokens` at the top level of the `usage` object instead of `prompt_tokens_details.cached_tokens`. `CompletionsEngine` reads both fields, preferring the standard OpenAI field.
- **OpenAI gpt-5.x reasoning effort** -- `"none"` is the wire value to disable reasoning (not `"off"`). Sending no `reasoning` block defaults to the model's built-in default effort. `"off"` is rejected by these models' `reasoning_efforts` list and raises `ValueError` at the factory.
- **OpenAI reasoning effort: `xhigh`** -- gpt-5.x and gpt-5.4.x support `"none" | "low" | "medium" | "high" | "xhigh"`. The effort is forwarded verbatim in `{"reasoning": {"effort": value}}` by ResponsesEngine.
