Metadata-Version: 2.4
Name: babyapi
Version: 0.3.2
Summary: BabyAPI client (OpenAI-compatible chat/completions/embeddings/rerank plus /docling document conversion).
Project-URL: Homepage, https://babyapi.org
Project-URL: Repository, https://github.com/babyapi-org/babyapi-python
Project-URL: Issues, https://github.com/babyapi-org/babyapi-python/issues
Author-email: BabyAPI <support@babyapi.org>
License: MIT
Keywords: babyapi,chat-completions,completions,docling,document-conversion,embeddings,llm,openai,rerank,vllm
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.9
Requires-Dist: httpx>=0.27.0
Description-Content-Type: text/markdown

![BabyAPI banner](https://api.babyapi.org/images/banner.png)

# BabyAPI (Python SDK)

A tiny Python client for **BabyAPI** — an **OpenAI-compatible** API for hosted open-weight models.

> Minimal surface area. Calm defaults. You bring an API key — we handle the GPUs.

**Endpoints**
- OpenAI-compatible:
  - `POST /v1/chat/completions`
  - `POST /v1/completions`
  - `POST /v1/embeddings`
  - `POST /v1/rerank`
- BabyAPI convenience:
  - `POST /infer` (simple text-in, text-out)
- Document conversion (Docling):
  - `POST /docling/v1/convert/source` &nbsp;·&nbsp; `POST /docling/v1/convert/file`
  - Async variants + chunking (hybrid / hierarchical)

---

## Install

```bash
pip install babyapi
```

---

## Quick start (the easy path): `client.baby.infer(...)`

If you just want **text in → text out**, start here.

```py
import os
from babyapi import BabyAPI

client = BabyAPI(
    api_key=os.getenv("BABYAPI_API_KEY"),
    default_model="mistral",  # so you can call baby.infer("...") without specifying model
)

out = client.baby.infer(
    {
        "prompt": "Write a 1-line release note title for BabyAPI.",
        "maxTokens": 40,
        "temperature": 0.5,
    }
)

print(out["output"])
print(out.get("usage"))
```

You can also pass a raw string:

```py
out = client.baby.infer("Explain BabyAPI in one sentence.")
print(out["output"])
```

### Supported options (aliases accepted)

You can pass options directly **or** inside `"options": {...}`:

- `max_tokens` / `maxTokens`
- `temperature`
- `top_p` / `topP`
- `top_k` / `topK`
- `stop`
- `presence_penalty` / `presencePenalty`
- `frequency_penalty` / `frequencyPenalty`

Example with aliases + nested `options`:

```py
out = client.baby.infer(
    {
        "model": "mistral",
        "prompt": "Give 3 calm API principles.",
        "options": {"topP": 0.9, "max_tokens": 80},
    }
)
print(out["output"])
```

---

## One method for both OpenAI endpoints: `client.infer(...)`

If you want “do the right thing” with OpenAI-style payloads:

- If you pass `messages` → routes to **chat completions**
- If you pass `prompt` → routes to **completions**

```py
chat_res = client.infer(
    {
        "model": "mistral",
        "messages": [{"role": "user", "content": "One-line slogan for BabyAPI?"}],
    }
)
print(chat_res["choices"][0]["message"]["content"])

comp_res = client.infer(
    model="mistral",
    prompt="Give 3 product names for a tiny LLM SDK.",
    max_tokens=60,
)
print(comp_res["choices"][0]["text"])
```

---

## OpenAI-compatible: Chat Completions

```py
res = client.chat.completions.create(
    model="mixtral",
    messages=[
        {"role": "system", "content": "You are concise."},
        {"role": "user", "content": "Give me 3 tagline ideas for a tiny LLM API."},
    ],
    temperature=0.7,
)

print(res["choices"][0]["message"]["content"])
```

---

## OpenAI-compatible: Completions

```py
res = client.completions.create(
    model="mistral",
    prompt="Write a friendly release note opener for BabyAPI.",
    max_tokens=120,
    temperature=0.7,
)

print(res["choices"][0]["text"])
```

---

## OpenAI-compatible: Embeddings

```py
res = client.embeddings.create(
    model="qwen3-embedding",
    input="BabyAPI makes LLMs easy.",
)

print(res["data"][0]["embedding"][:5])  # first 5 dimensions
print(res["usage"])
```

You can also embed multiple texts at once:

```py
res = client.embeddings.create(
    model="qwen3-embedding",
    input=[
        "First document to embed.",
        "Second document to embed.",
    ],
)

for item in res["data"]:
    print(f"Index {item['index']}: {len(item['embedding'])} dimensions")
```

### Supported parameters

| Parameter | Type | Description |
|---|---|---|
| `model` | `str` | **Required.** The embedding model to use. |
| `input` | `str \| list[str]` | **Required.** Text(s) to embed. |
| `encoding_format` | `str` | Optional. `"float"` (default) or `"base64"`. |
| `dimensions` | `int` | Optional. Truncate embeddings to this many dimensions. |
| `truncate_prompt_tokens` | `int` | Optional. Max tokens to keep (vLLM-specific). |

---

## Reranking

```py
res = client.rerank.create(
    model="qwen3-reranker",
    query="What is BabyAPI?",
    documents=[
        "BabyAPI is a tiny hosted LLM API.",
        "The weather is nice today.",
        "BabyAPI supports OpenAI-compatible endpoints.",
    ],
)

for result in res["results"]:
    print(f"Index {result['index']}: relevance_score={result['relevance_score']:.4f}")
```

### Supported parameters

| Parameter | Type | Description |
|---|---|---|
| `model` | `str` | **Required.** The reranker model to use. |
| `query` | `str` | **Required.** The query to rank documents against. |
| `documents` | `list[str]` | **Required.** Documents to rerank. |
| `top_n` | `int` | Optional. Return only the top N results. |
| `return_documents` | `bool` | Optional. Include document text in results. |
| `truncate_prompt_tokens` | `int` | Optional. Max tokens to keep (vLLM-specific). |

---

## Streaming (SSE)

`.stream(...)` yields `SSEEvent` objects:
- `event.done` → `True` when the stream is finished (`[DONE]`)
- `event.data` → parsed JSON when possible (otherwise `None`)
- `event.raw` → raw `data:` payload string

### Streaming: chat

```py
import os
from babyapi import BabyAPI

client = BabyAPI(api_key=os.getenv("BABYAPI_API_KEY"))

for event in client.chat.completions.stream(
    model="mistral",
    messages=[{"role": "user", "content": "Write a short poem about servers."}],
):
    if event.done:
        break

    delta = (event.data or {}).get("choices", [{}])[0].get("delta", {})
    chunk = delta.get("content")
    if chunk:
        print(chunk, end="", flush=True)

print()
```

### Streaming: completions

```py
for event in client.completions.stream(
    model="mistral",
    prompt="List 5 calm API-building tips.",
):
    if event.done:
        break

    text = (event.data or {}).get("choices", [{}])[0].get("text")
    if text:
        print(text, end="", flush=True)

print()
```

> Note: like many SDKs, streaming requests are not retried. If you want retries for streams, wrap your call at the application level.

---

## Multimodal (vision) examples (OpenAI-style)

If the model you select supports vision, you can send images using OpenAI-style message content.

### Vision: non-streaming

```py
res = client.chat.completions.create(
    model="pixtral",  # or another vision-capable model you expose
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe the image in 2 sentences. Then list 3 objects you see."},
                {
                    "type": "image_url",
                    "image_url": {"url": "https://api.babyapi.org/images/banner.png"},
                },
            ],
        }
    ],
)

print(res["choices"][0]["message"]["content"])
```

### Vision: streaming

```py
for event in client.chat.completions.stream(
    model="pixtral",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is this image trying to communicate?"},
                {"type": "image_url", "image_url": {"url": "https://api.babyapi.org/images/banner.png"}},
            ],
        }
    ],
):
    if event.done:
        break

    delta = (event.data or {}).get("choices", [{}])[0].get("delta", {})
    chunk = delta.get("content")
    if chunk:
        print(chunk, end="", flush=True)

print()
```

> Image support depends on the model you choose. If the model is text-only, the API may reject image inputs.

---

## Configuration

```py
import os
from babyapi import BabyAPI

client = BabyAPI(
    api_key=os.getenv("BABYAPI_API_KEY"),          # required (or BABY_API_KEY)
    base_url=os.getenv("BABYAPI_BASE_URL"),        # optional (default: https://api.babyapi.org)
    timeout_s=60.0,                                # JSON requests only
    max_retries=2,                                 # retry transient failures
    retry_base_delay_s=0.25,                       # exponential backoff base
    default_model="mistral",                       # used by client.baby.infer when model omitted
    default_headers={"x-app": "my-sideproject"},   # extra headers for every request
)
```

Environment variables supported:
- `BABYAPI_API_KEY` (or `BABY_API_KEY`)
- `BABYAPI_BASE_URL`
- `BABYAPI_DEFAULT_MODEL`

---

## Per-call overrides (RequestOptions)

Every `.create(...)` / `.stream(...)` accepts `request_options`.

```py
import os
from babyapi import BabyAPI, RequestOptions

client = BabyAPI(api_key=os.getenv("BABYAPI_API_KEY"))

res = client.chat.completions.create(
    request_options=RequestOptions(
        timeout_s=30.0,
        max_retries=0,
        headers={"x-trace": "abc123"},
    ),
    model="mistral",
    messages=[{"role": "user", "content": "Hello."}],
)
```

You can also pass a plain dict:

```py
res = client.chat.completions.create(
    request_options={"timeout_s": 10.0, "headers": {"x-app": "demo"}},
    model="mistral",
    messages=[{"role": "user", "content": "Hi again."}],
)
```

---

## Timeouts & cancellation

- JSON requests use `timeout_s` (default: 60s).
- Streaming requests default to **no timeout** (infinite), matching common SSE usage.
  - If you want a stream timeout, pass `request_options={"timeout_s": 30.0}`.
- To stop a stream early, `break` your loop.

---

## Docling: document conversion & chunking

The `client.docling.*` namespace wraps BabyAPI's [Docling](https://github.com/DS4SD/docling-serve) proxy.
Use it to convert PDFs / DOCX / PPTX / images into Markdown, JSON, HTML, text, or doctags,
and to chunk documents for downstream RAG pipelines.

All calls authenticate with your standard `BABYAPI_API_KEY`.

### Health / version

```py
client.docling.health()    # {"status": "ok"}
client.docling.ready()
client.docling.version()
```

### Convert from a URL (synchronous)

```py
res = client.docling.convert_source({
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2408.09869"}],
    "options": {"to_formats": ["md"], "do_ocr": False, "page_range": [1, 10]},
})

print(res["document"]["md_content"])
```

### Convert a local file (synchronous)

Files accept a path, bytes, file-like object, or structured dict:

```py
# path
client.docling.convert_file(files="./invoice.pdf")

# multiple files + options
client.docling.convert_file(
    files=["./a.pdf", "./b.docx"],
    options={"to_formats": ["md", "json"]},
)

# raw bytes
with open("./report.pdf", "rb") as fp:
    client.docling.convert_file(
        files={"filename": "report.pdf", "content": fp.read(), "content_type": "application/pdf"},
    )
```

### Convert asynchronously (recommended for large docs)

```py
submitted = client.docling.convert_file_async(files="./big.pdf")
task_id = submitted["task_id"]

# Easiest: poll + fetch in one call.
result = client.docling.wait_for_result(
    task_id,
    interval_s=2.0,
    timeout_s=600.0,
    on_poll=lambda s: print("status:", s.get("status")),
)

print(result["document"]["md_content"])
```

Low-level alternative:

```py
import time

submitted = client.docling.convert_source_async({
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2408.09869"}],
})
task_id = submitted["task_id"]

while True:
    status = client.docling.poll_status(task_id)
    if status["status"] in ("success", "failure", "error"):
        break
    time.sleep(2)

if status["status"] == "success":
    print(client.docling.get_result(task_id))
```

### Chunking

Same file/source shapes as conversion — output is a list of chunks suitable for embeddings.

```py
# Hybrid chunking from a URL
client.docling.chunk.hybrid_source({
    "sources": [{"kind": "http", "url": "https://arxiv.org/pdf/2408.09869"}],
})

# Hierarchical chunking from a file
client.docling.chunk.hierarchical_file(files="./handbook.pdf")
```

### Conversion options

Pass any docling-serve options via `options`. Common ones:

| Option | Default | Description |
|---|---|---|
| `to_formats` | `["md"]` | `md`, `json`, `html`, `text`, `doctags` |
| `do_ocr` | `True` | Run OCR on images/scanned pages |
| `force_ocr` | `False` | Force OCR even on text-layer PDFs |
| `do_table_structure` | `True` | Detect and extract table structure |
| `table_mode` | `"accurate"` | `accurate` or `fast` |
| `page_range` | full | e.g. `[1, 10]` |
| `image_export_mode` | `"embedded"` | `embedded` or `referenced` |
| `do_formula_enrichment` | `False` | |
| `do_picture_classification` | `False` | |

See the [Docling endpoints reference](https://api.babyapi.org/docs/docling-endpoints) for the full list.

---

## Errors

SDK errors raise `BabyAPIError` when possible.

```py
import os
from babyapi import BabyAPI, BabyAPIError

client = BabyAPI(api_key=os.getenv("BABYAPI_API_KEY"))

try:
    client.chat.completions.create(model="mistral", messages=[])
except BabyAPIError as err:
    print(
        {
            "message": err.message,
            "status": err.status,
            "code": err.code,
            "type": err.type,
            "request_id": err.request_id,
        }
    )
```

---

## Context manager / cleanup

The client maintains an `httpx.Client`. Use it as a context manager to ensure clean shutdown:

```py
import os
from babyapi import BabyAPI

with BabyAPI(api_key=os.getenv("BABYAPI_API_KEY")) as client:
    res = client.completions.create(model="mistral", prompt="Ping")
    print(res["choices"][0]["text"])
```

---

## License

MIT.
