Metadata-Version: 2.4
Name: sunwaee
Version: 1.5.1
Summary: SUNWÆE gen — multi-provider LLM engine library.
Author: David NAISSE
Maintainer: David NAISSE
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.27.0
Provides-Extra: files
Requires-Dist: pypdf>=4.0.0; extra == "files"
Requires-Dist: python-docx>=1.1.0; extra == "files"
Requires-Dist: openpyxl>=3.1.0; extra == "files"
Requires-Dist: python-pptx>=1.0.0; extra == "files"
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=1.0.0; extra == "dev"
Requires-Dist: pytest-cov>=5.0.0; extra == "dev"
Requires-Dist: setuptools_scm>=8; extra == "dev"
Requires-Dist: twine; extra == "dev"
Requires-Dist: build; extra == "dev"
Dynamic: license-file

![Coverage](https://img.shields.io/badge/coverage-100%25-brightgreen) ![Python](https://img.shields.io/badge/python-3.11%2B-blue) ![PyPI](https://img.shields.io/pypi/v/sunwaee) ![License](https://img.shields.io/badge/license-MIT-blue)

All LLMs, one response format, one dependency (httpx). Supports switching model in conversations (e.g. draft with GPT, refine with Anthropic).

Handles streaming, tool calls, file attachments, prompt caching, reasoning on/off, and cost tracking across Anthropic, OpenAI, Google, DeepSeek, xAI, and Moonshot.

---

## Install

```bash
pip install sunwaee
pip install "sunwaee[files]"   # pdf, docx, xlsx, pptx extraction
pip install -e ".[dev,files]"  # development
```

---

## Quick start

```python
import asyncio
from sunwaee.modules.gen.engine import get_engine
from sunwaee.modules.gen.engine.types import Message, Role

# enable_reasoning=False (default) — reasoning disabled / non-reasoning variant used
engine = get_engine("anthropic", "claude-sonnet-4-6")

# enable_reasoning=True — activates thinking for all providers
engine_reasoning = get_engine("anthropic", "claude-sonnet-4-6", enable_reasoning=True)

async def main():
    messages = [Message(role=Role.USER, content="Hello")]

    response = await engine.chat(messages)
    print(response.content, response.cost.total)

    async for chunk in engine.stream(messages):
        if chunk.content:
            print(chunk.content, end="", flush=True)

asyncio.run(main())
```

---

## Providers

| Provider  | `provider=`   | Env var             |
| --------- | ------------- | ------------------- |
| Anthropic | `"anthropic"` | `ANTHROPIC_API_KEY` |
| OpenAI    | `"openai"`    | `OPENAI_API_KEY`    |
| Google    | `"google"`    | `GOOGLE_API_KEY`    |
| DeepSeek  | `"deepseek"`  | `DEEPSEEK_API_KEY`  |
| xAI       | `"xai"`       | `XAI_API_KEY`       |
| Moonshot  | `"moonshot"`  | `MOONSHOT_API_KEY`  |

---

## Directory structure

```
sunwaee/
├── core/
│   ├── logger.py                 # get_logger(name) — scoped under "sunwaee.*"
│   └── tools.py                  # @tool decorator, ok(), err()
└── modules/gen/
    ├── __init__.py               # public re-exports (get_engine, run, stream_run, …)
    ├── agent.py                  # ReAct loop — run() + stream_run()
    ├── tools.py                  # TOOLS list
    └── engine/
        ├── __init__.py           # get_engine, Message, Response, Tool, …
        ├── base.py               # BaseEngine ABC
        ├── factory.py            # get_engine() — provider routing + connection pooling
        ├── model.py              # Model dataclass + compute_cost()
        ├── types.py              # Message, Response, ToolCall, Usage, Cost, Performance, …
        ├── models/               # model registry per provider
        │   ├── __init__.py       # get_model(), list_models()
        │   ├── anthropic.py / openai.py / google.py / deepseek.py / xai.py / moonshot.py
        └── providers/
            ├── anthropic.py      # AnthropicEngine
            ├── openai.py         # OpenAIEngine (also used by DeepSeek, xAI, Moonshot)
            └── google.py         # GoogleEngine

tests/gen/
├── test_agent.py / test_stream_agent.py / test_tools.py
└── engine/
    ├── test_types.py / test_factory.py / test_model.py
    ├── providers/
    │   └── test_anthropic.py / test_openai.py / test_google.py
    └── live/
        ├── _shared.py            # shared config, data, helpers for all live tests
        ├── test_scenarios.py     # all providers × all scenarios × chat + stream
        ├── test_tool_call_result.py  # TOOL_CALL → execute → reply, all providers
        ├── test_attachments.py   # image attachments, vision-capable providers
        ├── test_chain.py         # three-provider conversation chain
        ├── test_caching.py       # prompt-cache hit on turn 2
        ├── test_reasoning.py     # reasoning ON / OFF per model category
        └── run/                  # JSON snapshots (gitignored)
```

---

## Core types (`engine/types.py`)

```python
class Role(Enum):       SYSTEM, USER, ASSISTANT, TOOL, CONTEXT
class StopReason(Enum): END_TURN, TOOL_USE, MAX_TOKENS

@dataclass class Message:
    role: Role
    content: str | None
    reasoning_content: str | None       # thinking for models that support it
    reasoning_signature: str | None     # opaque blob — echo back verbatim
    tool_call_id: str | None            # set on Role.TOOL messages
    tool_calls: list[ToolCall] | None
    attachments: list[FileAttachment] | None   # Role.USER only

@dataclass class Response:
    provider: str; model: str; streaming: bool; synthetic: bool
    content: str | None; reasoning_content: str | None; reasoning_signature: str | None
    tool_calls: list[ToolCall] | None; stop_reason: StopReason | None; error: Error | None
    usage: Usage | None; cost: Cost | None; performance: Performance | None

@dataclass class ToolCall:
    id: str; name: str; arguments: dict
    thought_signature: str | None    # Google only — echo back every subsequent turn
    error: str | None; duration: float; results: list[dict]

@dataclass class Usage:
    input_tokens: int; output_tokens: int; total_tokens: int
    cache_read_tokens: int; cache_write_tokens: int

@dataclass class Cost:
    input: float; output: float; cache_read: float; cache_write: float; total: float

@dataclass class Performance:
    latency: float            # seconds to first chunk
    reasoning_duration: float; content_duration: float; total_duration: float
    throughput: int           # output tokens / second

@dataclass class FileAttachment:
    data: bytes; filename: str; media_type: str = ""
    # text/* → <file name="…">…</file> block
    # image/jpeg|png|gif|webp → base64 inline
    # application/pdf|json + OOXML (docx/xlsx/pptx) → extracted text
```

---

## `get_engine()` — reasoning control

```python
engine = get_engine(
    provider,
    model,
    api_key=None,          # falls back to <PROVIDER>_API_KEY env var
    max_tokens=8192,
    enable_reasoning=False, # True activates thinking/reasoning for all providers
)
```

### Connection pool

`get_engine()` reuses a single `httpx.AsyncClient` per `(event_loop, base_url)`. The pool is a `WeakKeyDictionary` keyed by the loop object so that dead loops (common in tests) drop their clients automatically — this avoids "Event loop is closed" errors when Python reuses an integer `id()` for a freshly created loop. Clients are configured with `Timeout(connect=5s, read=300s, write=30s)` and `Limits(max_connections=50)`. On graceful shutdown, call:

```python
from sunwaee.modules.gen.engine import close_all_clients

await close_all_clients()
```

`enable_reasoning` resolves all provider complexity automatically:

| Model `reasoning_mode`                 | `enable_reasoning=True`               | `enable_reasoning=False`              |
| -------------------------------------- | ------------------------------------- | ------------------------------------- |
| `"dynamic"`                            | Sends provider thinking config        | No thinking config sent               |
| `"always"` + `non_reasoning_id`        | Uses the model as-is                  | Swaps to `non_reasoning_id` variant   |
| `"always"` + no swap                   | Uses the model as-is (always reasons) | Uses the model as-is (cannot disable) |
| `None` (no reasoning) + `reasoning_id` | Swaps to `reasoning_id` variant       | Uses the model as-is                  |

Default reasoning configs when `enable_reasoning=True`:

| Provider / series                           | Mechanism                                               | Value                          |
| ------------------------------------------- | ------------------------------------------------------- | ------------------------------ |
| Anthropic (Opus 4.7/4.6, Sonnet 4.6)        | `output_config.effort` + `thinking: {type: "adaptive"}` | `effort="high"`                |
| Anthropic (Opus 4.5, Haiku 4.5, Sonnet 4.5) | `thinking: {type: "enabled", budget_tokens: N}`         | `max(1024, max_tokens - 1024)` |
| Google Gemini 3 (has `reasoning_efforts`)   | `thinkingConfig.thinkingLevel`                          | `"high"`                       |
| Google Gemini 2.5 (no `reasoning_efforts`)  | `thinkingConfig.thinkingBudget`                         | `-1` (dynamic)                 |
| OpenAI-compat (has `reasoning_efforts`)     | `reasoning_effort`                                      | `"high"`                       |

When `enable_reasoning=False` for dynamic models: Anthropic effort models use `effort="low"`; Gemini 3 uses the lowest effort in `reasoning_efforts[0]`; Gemini 2.5 uses `thinkingBudget=0`.

---

## `Model` dataclass (`engine/model.py`)

```python
@dataclass class Model:
    name: str; provider: str; display_name: str; description: str | None

    # specs
    context_window: int; max_output_tokens: int | None

    # features
    supports_vision: bool
    supports_tools: bool
    supports_reasoning: bool           # True if model has any reasoning capability

    # reasoning config
    reasoning_mode: str | None         # "always" | "dynamic" | None
    reasoning_efforts: list[str] | None  # valid effort levels (e.g. ["low","medium","high"])
    reasoning_tokens_type: str | None  # "raw" | "summary" | None
    reasoning_disabled_payload: dict | None  # merged into request when reasoning is disabled
    reasoning_id: str | None           # for non-reasoning variants: name of reasoning counterpart
    non_reasoning_id: str | None       # for reasoning models: name of non-reasoning counterpart

    cache_min_tokens: int | None

    # pricing (per million tokens)
    input_price_per_mtok: float | None; output_price_per_mtok: float | None
    cache_read_price_per_mtok: float | None; cache_write_price_per_mtok: float | None
    input_price_per_mtok_128k: float | None   # xAI only
    output_price_per_mtok_128k: float | None
    input_price_per_mtok_200k: float | None   # most providers
    output_price_per_mtok_200k: float | None; ...
    input_price_per_mtok_272k: float | None   # OpenAI only
    output_price_per_mtok_272k: float | None; ...

    release_date: str | None; deprecated_at: str | None; sunset_at: str | None
```

**`reasoning_mode`**:

- `"always"` — model always reasons; disable by swapping to `non_reasoning_id` (xAI models, DeepSeek Reasoner)
- `"dynamic"` — reasoning can be toggled on/off via provider-specific config (Anthropic, Google, OpenAI, Moonshot)
- `None` — no reasoning capability

**`reasoning_tokens_type`**:

- `"raw"` — full reasoning content returned in `response.reasoning_content` (Anthropic, DeepSeek, grok-4.20, kimi-k2.5)
- `"summary"` — summarised thought returned (Google, grok-4.20 on `/v1/responses` endpoint)
- `None` — reasoning tokens tracked internally only; content not exposed (OpenAI, most xAI)

**`reasoning_efforts`**:

- List of valid named effort levels for the model's reasoning API parameter.
- Anthropic: `["low","medium","high","max"]` or `["low","medium","high","xhigh","max"]` for Opus 4.7.
- Google Gemini 3: `["minimal","low","medium","high"]` or `["low","medium","high"]` depending on model.
- OpenAI: `["none","low","medium","high","xhigh"]` or similar per-model.
- `None` for models that use integer budgets (Gemini 2.5) or have no effort control.

---

## Usage

### With tools

```python
from sunwaee.core.tools import tool, ok, err
from sunwaee.modules.gen.engine.types import Tool

@tool("Return the current UTC time.")
async def get_time() -> str:
    from datetime import datetime, timezone
    return ok({"time": datetime.now(timezone.utc).isoformat()})

response = await engine.chat(messages, tools=[get_time._tool])
```

Tools must be `async def`. The agent refuses to run synchronous tools because `asyncio.wait_for` cannot cancel a thread once it has entered the sync body — a hung sync tool would otherwise leak threads from the default executor.

### File and image attachments

```python
from sunwaee.modules.gen.engine.types import FileAttachment, Message, Role

with open("report.pdf", "rb") as f:
    att = FileAttachment(data=f.read(), filename="report.pdf")

response = await engine.chat([Message(role=Role.USER, content="Summarise.", attachments=[att])])
```

Supported: `text/*`, `application/json`, `image/jpeg|png|gif|webp`, `application/pdf`, `.docx`, `.xlsx`, `.pptx`

Size caps enforced at construction: **10 MB** for images, **20 MB** for every other supported type. Oversized payloads raise `ValueError` before any extraction or base64 encoding runs.

### ReAct agent loop

```python
from sunwaee.modules.gen.agent import stream_run

new_messages = []
async for chunk in stream_run(
    messages,
    tools,
    engine,
    new_messages=new_messages,
    tool_timeout=60.0,          # seconds per individual tool call (default: 60)
    max_concurrent_tools=8,     # max tools running simultaneously (default: 8)
):
    if chunk.content:
        print(chunk.content, end="", flush=True)
# new_messages has all assistant + tool turns appended during the run
```

Up to 10 iterations by default. Concurrent tool calls via `asyncio.gather`, bounded by `max_concurrent_tools`. Tools must be `async def` — sync callables are rejected because `asyncio.wait_for` cannot cancel a thread once it has entered the sync body, which would leak executor threads on timeout. Unknown keyword arguments supplied by the model are silently filtered before calling the tool function.

### Error types

All provider errors subclass `EngineError(RuntimeError)`, so existing `except RuntimeError` handlers continue to work. Import subclasses to handle specific cases:

```python
from sunwaee.modules.gen.engine import EngineError, RateLimitError, AuthError, TransientError

try:
    response = await engine.chat(messages)
except RateLimitError as e:   # 429 — back off and retry
    ...
except AuthError as e:        # 401 / 403 — invalid key
    ...
except TransientError as e:   # 5xx — server-side; may be retried
    ...
except EngineError as e:      # other 4xx
    print(e.status_code)
```

### Listing models

```python
from sunwaee.modules.gen.engine.models import list_models, get_model

all_models = list_models()              # list[Model]
model = get_model("claude-sonnet-4-6")  # Model | None
```

---

## Testing

```bash
pytest tests/gen/ -m "not live"                                        # unit (no keys needed)
pytest tests/gen/ -m live                                              # live (real API calls)
pytest tests/gen/ -m "not live" --cov=sunwaee --cov-report=term-missing

# run a single live test file
pytest -m live tests/gen/engine/live/test_caching.py
pytest -m live tests/gen/engine/live/test_reasoning.py
```

Unit test conventions:

- Mock `httpx.AsyncClient` — never make real HTTP calls
- Assert `response.cost`, `response.usage`, `response.performance` populated on final chunk
- For streaming, use an async generator as mock transport

Live test files and what they cover:

| File                       | What it tests                                         |
| -------------------------- | ----------------------------------------------------- |
| `test_scenarios.py`        | 6 scenarios × 6 providers × chat + stream (72 tests)  |
| `test_tool_call_result.py` | Full TOOL_CALL → execute → reply loop, all providers  |
| `test_attachments.py`      | PNG image attachment, vision-capable providers        |
| `test_chain.py`            | Three-provider conversation chain with shared history |
| `test_caching.py`          | Prompt-cache hit on turn 2, static system prompt      |
| `test_reasoning.py`        | `enable_reasoning` ON / OFF per model category        |

Live scenarios:

| Scenario           | What it tests                                   |
| ------------------ | ----------------------------------------------- |
| `ONLY_SYSTEM`      | System-only input edge case; lenient assertions |
| `ONLY_USER`        | Single user message                             |
| `SYSTEM_AND_USER`  | System prompt respected in response             |
| `TOOL_CALL`        | Model must issue at least one tool call         |
| `TOOL_CALL_RESULT` | Full multi-turn with real tool IDs/signatures   |
| `FILE_ATTACHMENT`  | Text file attached; asserts content populated   |
| `CONTEXT_ROLE`     | `Role.CONTEXT` message handled without errors   |

All live tests default to `enable_reasoning=False`. `test_reasoning.py` is the only file that explicitly passes `enable_reasoning=True`.

---

## How to add a model

**File:** `sunwaee/modules/gen/engine/models/<provider>.py`

```python
Model(
    name="provider-model-name",
    display_name="Human Readable Name",
    provider="anthropic",
    context_window=200_000,
    max_output_tokens=64_000,
    input_price_per_mtok=3.0,
    output_price_per_mtok=15.0,
    cache_read_price_per_mtok=0.3,
    cache_write_price_per_mtok=3.75,
    input_price_per_mtok_200k=6.0,       # omit if no >200k tier
    output_price_per_mtok_200k=22.5,
    supports_vision=True,
    supports_tools=True,
    supports_reasoning=True,
    reasoning_mode="dynamic",             # "always" | "dynamic" | None
    reasoning_efforts=["low", "medium", "high", "max"],  # omit if not applicable
    reasoning_tokens_type="raw",          # "raw" | "summary" | None
    non_reasoning_id="model-non-reasoning",  # omit if no non-reasoning variant
    cache_min_tokens=1_024,              # omit (None) if caching is undocumented
    release_date="2025-01-01",
)
```

For a non-reasoning variant that pairs with a reasoning model:

```python
Model(
    name="model-non-reasoning",
    ...same pricing...,
    supports_reasoning=False,
    reasoning_id="model",                 # points to the reasoning counterpart
)
```

**Pricing tiers** (`engine/model.py`): base required; `_128k` when `input_tokens > 128_000` (xAI only); `_200k` when `> 200_000`; `_272k` when `> 272_000` (OpenAI only). Thresholds are strict `>` — exactly at the boundary uses the lower tier.

**`cache_min_tokens`** — minimum tokens required at a cache breakpoint for prompt caching to activate. `None` = no caching. `0` = no minimum (caches everything). Known values:

| Provider  | Minimum | Models                                  |
| --------- | ------- | --------------------------------------- |
| Anthropic | 4,096   | Opus 4.7, Opus 4.6, Opus 4.5, Haiku 4.5 |
| Anthropic | 2,048   | Sonnet 4.6                              |
| Anthropic | 1,024   | Sonnet 4.5                              |
| OpenAI    | 1,024   | All models (automatic prefix caching)   |
| Google    | 1,024   | All models (explicit context caching)   |
| xAI       | 0       | All models (automatic, no minimum)      |
| DeepSeek  | 64      | All models (automatic prefix caching)   |
| Moonshot  | 0       | All models (automatic, no minimum)      |

---

## How to add an OpenAI-compatible provider

1. `engine/models/<provider>.py` — `MODELS` list
2. `engine/models/__init__.py` — import + add to `_ALL`
3. `engine/factory.py` — add to `_OPENAI_COMPATIBLE: dict[str, str]` (env var auto-derived as `PROVIDER_API_KEY`)
4. `tests/gen/engine/live/_shared.py` — add `("provider", "cheapest-model")` to `ENGINES`

---

## How to add a provider with a custom API

1. `engine/models/<provider>.py` + register in `__init__.py`
2. `engine/providers/<provider>.py` — implement `BaseEngine`:
   - `async def chat(self, messages, tools=None) -> Response`
   - `async def stream(self, messages, tools=None) -> AsyncIterator[Response]`
   - Accept `client: httpx.AsyncClient | None = None`
   - Call `resolve_tokens(usage)` before `compute_cost`
   - Strip `reasoning_content`/`reasoning_signature` from all but the last assistant turn
   - Handle system-only input: promote to `Role.USER`
   - On 4xx/5xx in streaming: read full body before raising
   - Buffer tool call JSON across SSE chunks; parse only on stop
3. `engine/factory.py` — wire into `get_engine()`, handle `enable_reasoning` for the new provider
4. Tests: unit (`providers/test_<provider>.py`) + live entry in `_shared.py`

---

## How to add a tool to the agent

```python
from typing import Annotated
from sunwaee.core.tools import tool, ok, err

@tool("Search the web for current information.")
def web_search(
    query: Annotated[str, "The search query"],
    num_results: Annotated[int, "Number of results"] = 5,
) -> str:
    try:
        return ok(_do_search(query, num_results))
    except Exception as e:
        return err(str(e))
```

Register: add `web_search._tool` to `TOOLS` in `sunwaee/modules/gen/tools.py`.

Tests: `tests/gen/test_<tool_name>.py` — call directly, assert JSON output shape, test error path. Never call real external APIs.

---

## `@tool` decorator

Introspects signature to build JSON Schema `parameters` automatically.

Supports: `str`, `int`, `float`, `bool`, `list[T]`, `Literal[...]`, `Optional[T]`, `Annotated[T, "description"]`

- Parameters with defaults → not `required`
- Both sync and async supported
- Must return JSON string: `ok(data)` / `err(message)` / `json.dumps(...)`

```python
ok({"id": "123"})   # '{"ok": true, "data": {"id": "123"}}'
err("Not found")    # '{"ok": false, "error": "Not found"}'
```

---

## Provider-specific quirks

| #   | Rule                                                                                                                                                                                                                                                                                                                                                                  |
| --- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1   | **`resolve_tokens()` before `compute_cost()`** — xAI/Google exclude reasoning tokens from `output_tokens`; `resolve_tokens` back-calculates from `total_tokens`. Always called unconditionally — it's a no-op when counts already match.                                                                                                                              |
| 2   | **Strip reasoning from all but last assistant turn** — stale `reasoning_signature` breaks APIs and blocks mid-session provider switches.                                                                                                                                                                                                                              |
| 3   | **OpenAI uses `max_completion_tokens`**, not `max_tokens`.                                                                                                                                                                                                                                                                                                            |
| 4   | **OpenAI reasoning models: yield synthetic chunk immediately** — stream is silent during thinking; `Response(reasoning_content="Reasoning in progress…", synthetic=True)`. Never treat `synthetic=True` as real content.                                                                                                                                              |
| 5   | **Google: `thoughtSignature` on `functionCall` part** → `ToolCall.thought_signature`; echo every subsequent turn.                                                                                                                                                                                                                                                     |
| 6   | **Google: no tool call IDs** — use function name as correlation ID.                                                                                                                                                                                                                                                                                                   |
| 7   | **Google streaming: `?alt=sse` required** on `streamGenerateContent`.                                                                                                                                                                                                                                                                                                 |
| 8   | **System-only input** — promote system message to `Role.USER` (Anthropic + Google).                                                                                                                                                                                                                                                                                   |
| 9   | **Anthropic reasoning: two paths.** Newer models (Opus 4.7/4.6, Sonnet 4.6) use `output_config: {effort: X}` + `thinking: {type: "adaptive"}`. Older models (Opus 4.5, Haiku 4.5, Sonnet 4.5) use `thinking: {type: "enabled", budget_tokens: N}` with `1024 ≤ budget < max_tokens`. The factory selects the path based on whether the model has `reasoning_efforts`. |
| 10  | **Connection pooling**: one `httpx.AsyncClient` per `(event_loop_id, base_url)` in `factory.py`.                                                                                                                                                                                                                                                                      |
| 11  | **`Role.CONTEXT` mapping**: all providers wrap content in `<context>` tags automatically — Anthropic → `{"role":"user","content":"<context>…</context>"}`; OpenAI → `{"role":"system","content":"<context>…</context>"}`; Google → `{"role":"user","parts":[{"text":"<context>…</context>"}]}`.                                                                       |
| 12  | **Google Gemini 3 uses `thinkingLevel`** (string: `"minimal"/"low"/"medium"/"high"`); Gemini 2.5 uses `thinkingBudget` (int: `-1` = dynamic, `0` = off, `N` = fixed). The engine selects based on whether the model has `reasoning_efforts`. Gemini 3.1 Pro and 2.5 Pro cannot disable thinking (`reasoning_mode="always"`).                                          |
| 13  | **kimi-k2.5 (Moonshot) reasons by default** — disabling thinking requires an explicit payload `{"thinking": {"type": "disabled"}}`. Set via `Model.reasoning_disabled_payload`; the OpenAI engine merges it when `reasoning_effort` is None.                                                                                                                          |
| 14  | **xAI always-reasoning models** (`grok-4.20`, `grok-4-1-fast`, `grok-4-fast`) route to a non-reasoning variant on `enable_reasoning=False` via `non_reasoning_id`. Models without a `non_reasoning_id` (`grok-4`, `grok-3-mini`, `grok-code-fast-1`) cannot have reasoning disabled.                                                                                  |
| 15  | **grok-4.20 returns `reasoning_content` on `chat/completions`** — `reasoning_tokens_type="summary"` refers to the `/v1/responses` endpoint only; on `chat/completions` the field carries full raw reasoning text.                                                                                                                                                     |
