Metadata-Version: 2.4
Name: genai-calling
Version: 0.1.6
Summary: Single-endpoint GenAI SDK (multi-provider, multimodal)
License-Expression: MIT
Project-URL: Homepage, https://github.com/gravtice/genai-calling
Project-URL: Issues, https://github.com/gravtice/genai-calling/issues
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp>=1.25.0
Requires-Dist: typing-extensions>=4.9.0
Requires-Dist: uvicorn>=0.30.0
Dynamic: license-file

# genai-calling

![CI](https://github.com/gravtice/genai-calling/actions/workflows/ci.yml/badge.svg)
![Python](https://img.shields.io/badge/python-≥3.10-blue)
![License](https://img.shields.io/badge/license-MIT-green)

中文文档：`README_ZH.md`

One interface for calling multimodal models; four ways to use: Skill, MCP, CLI, SDK.

## Features

- **Multi-provider**: OpenAI, Google (Gemini), Anthropic (Claude), Aliyun (DashScope/Bailian), Volcengine (Doubao/Ark), Tuzi
- **Multimodal**: text/image/audio/video input and output (model-dependent)
- **Unified API**: a single `Client.generate()` for all providers
- **Streaming**: `generate_stream()` for incremental output
- **Tool calling**: function tools (model/provider-dependent)
- **JSON Schema output**: structured output (model/provider-dependent)
- **MCP Server**: Streamable HTTP and SSE transport
- **Security**: SSRF protection, DNS pinning, download limits, Bearer token auth (MCP)

## Installation

```bash
pip install genai-calling
```

Python import package is `gravtice`.

For development:

```bash
pip install -e .
# or (recommended)
uv sync --group dev
```

## Skill (External Repository)

The standalone skill is no longer bundled in this repository.

Preferred install name:

```bash
npx skills add gravtice/nous-skills -s genai-calling
```

Legacy catalogs may still expose the old entry name:

```bash
npx skills add gravtice/nous-skills -s nous-genai
```

Skill repository:
https://github.com/gravtice/nous-skills

## Configuration (Env Vars, Zero-parameter)

Configuration is managed via environment variables.

You can set env vars in two ways:

1. Runtime env vars (inline or exported in shell)
2. Env files (`.env.local`, `.env.production`, `.env.development`, `.env.test`) and the global fallback `~/.genai-calling/.env`

Runtime example (inline):

```bash
GENAI_CALLING_OPENAI_API_KEY=... uv run genai --model openai:gpt-4o-mini --prompt "Hello"
```

When env files are used, SDK/CLI/MCP loads them automatically with priority (high -> low):

`.env.local > .env.production > .env.development > .env.test > ~/.genai-calling/.env`

Process env vars override both project and global env files (the loader uses `os.environ.setdefault()`).

Use `~/.genai-calling/.env` for user-wide shared defaults such as API keys. Keep worktree-specific settings such as ports in project-local `.env.local`.

Minimal `.env.local` (OpenAI only):

```bash
GENAI_CALLING_OPENAI_API_KEY=...
GENAI_CALLING_TIMEOUT_MS=120000
```

See `docs/CONFIGURATION.md` for all options, or copy `.env.example` to `.env.local`.

## Quickstart

### CLI (fastest, unified API, agent-friendly)

```bash
# List available models by capabilities (out=text/image/audio/video/embedding)
uv run genai model available --all

# Text generation
uv run genai --model openai:gpt-4o-mini --prompt "Hello"

# Image understanding (image -> text)
uv run genai --model openai:gpt-4o-mini --prompt "Describe this image" --image-path ./examples/demo_image.png

# Image generation (text -> image file)
uv run genai --model openai:gpt-image-1 --prompt "A red cube on white background, minimal" --output-path ./out.png

# Speech-to-text (audio -> text)
uv run genai --model openai:whisper-1 --audio-path ./examples/demo_tts.mp3

# Text-to-speech (text -> audio file)
uv run genai --model openai:tts-1 --prompt "Hello from genai-calling" --output-path ./out.mp3

# Video generation (text -> video; async style)
uv run genai --model openai:sora-2 --prompt "A paper boat sailing on a rain puddle, cinematic" --no-wait
# ...later
uv run genai --model openai:sora-2 --job-id "<job_id>" --output-path ./out.mp4 --timeout-ms 600000
```

### SDK: Text generation

```python
from gravtice import Client, GenerateRequest, Message, OutputSpec, Part

client = Client()
resp = client.generate(
    GenerateRequest(
        model="openai:gpt-4o-mini",
        input=[Message(role="user", content=[Part.from_text("Hello!")])],
        output=OutputSpec(modalities=["text"]),
    )
)
print(resp.output[0].content[0].text)
```

### SDK: Streaming

```python
import sys
from gravtice import Client, GenerateRequest, Message, OutputSpec, Part

client = Client()
req = GenerateRequest(
    model="openai:gpt-4o-mini",
    input=[Message(role="user", content=[Part.from_text("Tell me a joke")])],
    output=OutputSpec(modalities=["text"]),
)
for ev in client.generate_stream(req):
    if ev.type == "output.text.delta":
        sys.stdout.write(str(ev.data.get("delta", "")))
        sys.stdout.flush()
print()
```

### SDK: Image understanding

```python
from gravtice import Client, GenerateRequest, Message, OutputSpec, Part, PartSourcePath
from gravtice import detect_mime_type

path = "./cat.png"
mime = detect_mime_type(path) or "application/octet-stream"

client = Client()
resp = client.generate(
    GenerateRequest(
        model="openai:gpt-4o-mini",
        input=[
            Message(
                role="user",
                content=[
                    Part.from_text("Describe this image"),
                    Part(type="image", mime_type=mime, source=PartSourcePath(path=path)),
                ],
            )
        ],
        output=OutputSpec(modalities=["text"]),
    )
)
print(resp.output[0].content[0].text)
```

### SDK: List available models

```python
from gravtice import Client

client = Client()
print(client.list_all_available_models())
```

## Providers

| Provider | Notes |
|----------|------|
| `openai` | GPT-4, DALL·E, Whisper, TTS |
| `google` | Gemini, Imagen, Veo |
| `anthropic` | Claude |
| `aliyun` | DashScope / Bailian (OpenAI-compatible + AIGC) |
| `volcengine` | Ark / Doubao (OpenAI-compatible) |
| `tuzi-web` / `tuzi-openai` / `tuzi-google` / `tuzi-anthropic` | Tuzi adapters |

## Binary output

Binary `Part.source` is a tagged union:

- **Input**: `bytes/path/base64/url/ref` (MCP forbids `bytes/path`)
- **Output**: `url/base64/ref` (SDK does not auto-download to disk)

If you need to write to file, see `examples/demo.py` (`_write_binary()`), or reuse `Client.download_to_file()` for the built-in safe downloader.

## CLI & MCP Server

```bash
# CLI
uv run genai --model openai:gpt-4o-mini --prompt "Hello"
uv run genai model available --all

# Tuzi Chirp music
uv run genai --model tuzi-web:chirp-v3-5 --prompt "Lo-fi hiphop beat, 30s" --no-wait
# ...later
uv run genai --model tuzi-web:chirp-v3-5 --job-id "<job_id>" --output-path demo_suno.mp3 --timeout-ms 600000

# MCP Server
uv run genai-mcp-server                    # Streamable HTTP: /mcp, SSE: /sse
uv run genai-mcp-cli tools                 # Debug CLI
```

## Security

- **SSRF protection**: rejects private/loopback URLs by default (`GENAI_CALLING_ALLOW_PRIVATE_URLS=1` to allow)
- **DNS pinning**: mitigates DNS rebinding
- **Download limit**: 128MiB per URL by default (`GENAI_CALLING_URL_DOWNLOAD_MAX_BYTES`)
- **Bearer token auth**: for MCP server
- **Token rules**: fine-grained access control

## Testing

```bash
uv run python -m pytest tests/ -v
```

## Docs

- [Configuration](docs/CONFIGURATION.md)
- [Contributing](CONTRIBUTING.md)
- [Changelog](CHANGELOG.md)

## License

[MIT](LICENSE)
