Metadata-Version: 2.4
Name: llmskit
Version: 0.2.1
Summary: Client and Tools for LLMs
Author-email: hongbo liu <bananabo@foxmail.com>
License: Apache-2.0
Project-URL: Homepage, https://gitee.com/maxbanana
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.28.1
Requires-Dist: openai>=2.31.0
Requires-Dist: tenacity>=9.1.4
Requires-Dist: anthropic>=0.92.0
Requires-Dist: google-genai>=1.71.0
Dynamic: license-file

# llmskit

`llmskit` provides a unified Python interface for chat, embeddings, and reranking across multiple LLM providers.

The current codebase exposes:

- Unified sync and async chat wrappers
- OpenAI-style streaming and completion responses
- Provider adapters for `openai`, `gemini`, and `claude`
- Canonical multimodal message parts and tool definitions
- OpenAI-compatible embeddings helpers
- Generic reranker clients

## Installation

```bash
pip install llmskit
```

## Public API

```python
from llmskit import (
    AsyncChatLLM,
    AsyncOpenAIEmbeddings,
    AsyncReranker,
    ChatLLM,
    OpenAIEmbeddings,
    Reranker,
)
```

## Chat Quick Start

### Synchronous chat

```python
from llmskit import ChatLLM

chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",  # replace with your OpenAI-compatible endpoint
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Introduce yourself in one sentence."},
]

response = chat.complete(messages=messages)
message = response["choices"][0]["message"]

print(message["content"])
print(message["reasoning_content"])
print(response["usage"])
```

`ChatLLM` is intended for blocking synchronous code paths and now uses native sync provider clients. In Jupyter notebooks, async web frameworks, or inside `async def` code, prefer `AsyncChatLLM` so you do not block the active event loop.

### Asynchronous chat

```python
import asyncio

from llmskit import AsyncChatLLM


async def main() -> None:
    chat = AsyncChatLLM.from_gemini(
        model="gemini-2.5-flash",
        api_key="YOUR_API_KEY",
    )

    response = await chat.complete(
        messages=[
            {"role": "system", "content": "Answer briefly."},
            {"role": "user", "content": "What is llmskit?"},
        ]
    )

    print(response["choices"][0]["message"]["content"])


asyncio.run(main())
```

If your runtime already has an event loop, such as Jupyter Notebook or FastAPI / Starlette request handlers, prefer `AsyncChatLLM` to keep that loop non-blocking.

## Provider Factories

Use explicit factory methods when you already know the backend:

```python
from llmskit import ChatLLM

openai_chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

gemini_chat = ChatLLM.from_gemini(
    model="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
)

claude_chat = ChatLLM.from_claude(
    model="claude-sonnet-4-20250514",
    api_key="YOUR_API_KEY",
)
```

Or choose the provider dynamically:

```python
from llmskit import ChatLLM

chat = ChatLLM.create(
    provider="openai",
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)
```

Supported provider names for `create(...)`:

- `openai`
- `gemini`
- `claude`

Deprecated aliases still exist in code, but new code should prefer:

- `from_openai(...)` instead of `from_gpt(...)` or `from_local(...)`
- `from_claude(...)` instead of `from_anthropic(...)`

Factory methods are for construction-time options such as `base_url`,
`client_logger`, and `retry_config`. Request options such as `temperature`,
`max_tokens`, or provider `response_format` belong on `complete(...)` / `stream(...)`.
Use `result_format` when you want llmskit itself to return the legacy compatibility object.

You can also register custom chat providers without editing `llmskit.chat`:

```python
from typing import Any, AsyncIterator

from llmskit import AsyncChatLLM
from llmskit.clients import AsyncLLMClient
from llmskit.core import register_chat_provider
from llmskit.types import Message, ProviderEvent, ToolDefinition


class MyChatClient(AsyncLLMClient):
    provider = "my-provider"
    model = "demo-model"
    capabilities = {
        "tool_calling": False,
        "reasoning": False,
        "streaming": True,
        "vision": False,
        "audio_input": False,
        "audio_output": False,
        "document_input": False,
        "video_input": False,
        "native_multimodal_output": False,
    }

    async def events(
        self,
        messages: list[Message],
        *,
        tools: list[ToolDefinition] | None = None,
        **kwargs: Any,
    ) -> AsyncIterator[ProviderEvent]:
        del messages, tools, kwargs
        if False:  # pragma: no cover
            yield ProviderEvent()


register_chat_provider(name="my-provider", async_client_factory=MyChatClient, replace=True)
chat = AsyncChatLLM.create("my-provider", model="demo-model")
```

If you also want `ChatLLM.create("my-provider", ...)` support, register a
native sync client with `sync_client_factory=...` as well.

If your custom provider needs per-model capability differences, declare
`provider_capability_defaults` and `model_capability_catalog` on the client
class. For OpenAI-compatible private models, you can also override the shared
model capability snapshot via `from_openai(..., capability_overrides={...})`.

## Response Formats

`ChatLLM.complete(...)` and `AsyncChatLLM.complete(...)` return an OpenAI-style response by default.

```python
response = chat.complete(messages=messages)

print(response["object"])  # chat.completion
print(response["choices"][0]["message"]["content"])
print(response["choices"][0]["message"]["tool_calls"])
print(response["usage"])
print(response["provider_extensions"])
```

If you still need the old compatibility object, request `result_format="legacy"`:

```python
legacy_response = chat.complete(
    messages=messages,
    result_format="legacy",
)

print(legacy_response.content)
print(legacy_response.reasoning_content)
print(legacy_response.tool_calls)
```

Provider request formatting still uses `response_format`, for example:

```python
response = chat.complete(
    messages=messages,
    response_format={"type": "json_object"},
)
```

Provider-native request options should go inside `provider_options`, for example:

```python
chat.complete(
    messages=messages,
    provider_options={"reasoning_effort": "high"},  # OpenAI native
)

chat.complete(
    messages=messages,
    provider_options={"thinking": {"type": "enabled", "budget_tokens": 1024}},  # Claude native
)

chat.complete(
    messages=messages,
    provider_options={"candidate_count": 2},  # Gemini native
)
```

Keep shared llmskit options such as `temperature`, `max_tokens`, `modalities`,
`audio`, and `response_format` at the top level. Unknown top-level provider kwargs
now raise a validation error instead of being silently ignored, and
`provider_options` cannot override llmskit-managed keys such as `model`,
`messages`, or `stream`.

`response_format="legacy"` still works as a deprecated compatibility alias for
older code, but new code should prefer `result_format="legacy"`.

## Streaming

`stream(...)` yields OpenAI-style chat completion chunks.

```python
from llmskit import ChatLLM

chat = ChatLLM.from_openai(
    model="gpt-4o-mini",
    api_key="YOUR_API_KEY",
    base_url="https://api.openai.com/v1",
)

for chunk in chat.stream(
    messages=[{"role": "user", "content": "Count from 1 to 3."}]
):
    choice = chunk["choices"][0]
    delta = choice["delta"]

    if delta.get("role"):
        print("role:", delta["role"])
    if delta.get("content"):
        print(delta["content"], end="")
    if delta.get("reasoning_content"):
        print("\nreasoning:", delta["reasoning_content"])
    if delta.get("tool_calls"):
        print("\ntool_calls:", delta["tool_calls"])
    if choice.get("finish_reason"):
        print("\nfinish_reason:", choice["finish_reason"])
```

## Tool Calling

Tool definitions use one canonical schema across providers:

```python
tools = [
    {
        "name": "get_weather",
        "description": "Get the weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {"type": "string"},
            },
            "required": ["city"],
        },
    }
]
```

Pass them to `complete(...)` or `stream(...)`:

```python
response = chat.complete(
    messages=[{"role": "user", "content": "What is the weather in Beijing?"}],
    tools=tools,
)

tool_calls = response["choices"][0]["message"]["tool_calls"]
print(tool_calls)
```

Returned tool calls are normalized to an OpenAI-style structure:

```python
[
    {
        "id": "call_123",
        "type": "function",
        "function": {
            "name": "get_weather",
            "arguments": "{\"city\":\"Beijing\"}",
        },
    }
]
```

## Multimodal Messages

Message `content` can be either a plain string or a list of structured content parts.

Supported canonical content part types:

- `text`
- `image_url`
- `input_audio`
- `file`
- `video_url`

### Vision example

```python
from llmskit import ChatLLM

chat = ChatLLM.from_gemini(
    model="gemini-2.5-flash",
    api_key="YOUR_API_KEY",
)

messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this image."},
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/cat.png",
                    "format": "image/png",
                },
            },
        ],
    }
]

response = chat.complete(messages=messages)
print(response["choices"][0]["message"]["content"])
```

### Other content part shapes

```python
audio_part = {
    "type": "input_audio",
    "input_audio": {
        "data": "<base64-audio-data>",
        "format": "wav",
    },
}

file_part = {
    "type": "file",
    "file": {
        "file_id": "gs://bucket/report.pdf",
        "format": "application/pdf",
    },
}

video_part = {
    "type": "video_url",
    "video_url": {
        "url": "gs://bucket/demo.mp4",
        "format": "video/mp4",
    },
}
```

The wrapper validates unsupported modalities early, so provider/model mismatches fail fast instead of being silently forwarded.

You can inspect capabilities at runtime:

```python
print(chat.capabilities)
print(chat.capability_snapshot())
print(chat.refresh_capabilities())
print(chat.supports_vision())
print(chat.supports_audio_input())
print(chat.supports_audio_output())
print(chat.supports_document_input())
print(chat.supports_video_input())
```

Where:

- `chat.capabilities` is the backward-compatible boolean view.
- `chat.capability_snapshot()` returns a model-level snapshot with `state` /
  `source` metadata.
- `chat.refresh_capabilities()` re-resolves the shared snapshot for the current
  `provider + model + base_url` tuple and preserves runtime-learned corrections
  by default.

## Provider Capability Overview

The table below describes default model-family capabilities for built-in
providers. At runtime, the authoritative behavior is the model-level capability
snapshot, not class-level static constants:

| Provider | Tool calling | Reasoning | Vision | Audio input | Audio output | Document input | Video input |
| --- | --- | --- | --- | --- | --- | --- | --- |
| OpenAI-compatible | Yes | Yes | Yes | Yes | Yes | No | No |
| Claude | Yes | Yes | Yes | No | No | Yes | No |
| Gemini | Yes | Yes | Yes | Yes | Yes | Yes | Yes |

## Embeddings

`OpenAIEmbeddings` and `AsyncOpenAIEmbeddings` target OpenAI-compatible embedding endpoints.

### Synchronous embeddings

```python
from llmskit import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    base_url="https://api.openai.com/v1",
    model="text-embedding-3-small",
    api_key="YOUR_API_KEY",
    batch_size=16,
)

query_vector = embeddings.embed_query("What is llmskit?")
document_vectors = embeddings.embed_documents(
    [
        "llmskit wraps multiple chat providers.",
        "It also includes embeddings and reranking helpers.",
    ]
)

print(len(query_vector))
print(len(document_vectors))
print(embeddings.get_embedding_dimension())
```

### Asynchronous embeddings

```python
import asyncio

from llmskit import AsyncOpenAIEmbeddings


async def main() -> None:
    embeddings = AsyncOpenAIEmbeddings(
        base_url="https://api.openai.com/v1",
        model="text-embedding-3-small",
        api_key="YOUR_API_KEY",
    )

    vector = await embeddings.embed_query("hello")
    print(len(vector))


asyncio.run(main())
```

Embedding helpers include:

- batching
- retry with exponential backoff
- max input length truncation
- cached dimension detection

## Reranker

`Reranker` and `AsyncReranker` call a rerank service with a `/rerank` endpoint.

### Synchronous reranking

```python
from llmskit import Reranker

reranker = Reranker(
    base_url="https://your-reranker-service",
    model="bge-reranker-v2-m3",
    api_key="YOUR_API_KEY",
)

result = reranker.rerank(
    query="python async http client",
    documents=[
        "httpx supports both sync and async clients",
        "Redis is an in-memory database",
        "Python generators can yield values lazily",
    ],
    top_n=2,
    threshold=0.0,
)

print(result.results)
print(result.usage)
```

### Asynchronous reranking

```python
import asyncio

from llmskit import AsyncReranker


async def main() -> None:
    reranker = AsyncReranker(
        base_url="https://your-reranker-service",
        model="bge-reranker-v2-m3",
        api_key="YOUR_API_KEY",
    )

    result = await reranker.rerank(
        query="python async http client",
        documents=[
            "httpx supports both sync and async clients",
            "Redis is an in-memory database",
        ],
        top_n=1,
    )
    print(result.results)


asyncio.run(main())
```

## Notes

- `ChatLLM` and `AsyncChatLLM` normalize provider responses into OpenAI-style chunks and completion payloads.
- The default non-streaming response is the new OpenAI-style dictionary, not the legacy dataclass.
- `OpenAIEmbeddings` works with OpenAI-compatible services, including self-hosted endpoints that implement the embeddings API.
- Retries are built in for transient network and server-side failures.
- For local development and CI, run `python -m pytest -q` from the repository root.
- The examples above use placeholder model names and endpoints; replace them with the values supported by your provider.
