Metadata-Version: 2.4
Name: llamaindex-doubleword
Version: 0.2.1
Summary: An integration package connecting Doubleword and LlamaIndex.
Project-URL: Homepage, https://doubleword.ai
Project-URL: Documentation, https://docs.doubleword.ai
Project-URL: Repository, https://github.com/doublewordai/llamaindex-doubleword
Project-URL: Issues, https://github.com/doublewordai/llamaindex-doubleword/issues
Author-email: Doubleword <info@doubleword.ai>
License: MIT
Keywords: batch,doubleword,embeddings,llamaindex,llm,openai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: autobatcher>=0.4.0
Requires-Dist: llama-index-core>=0.12.0
Requires-Dist: llama-index-embeddings-openai>=0.3.0
Requires-Dist: llama-index-llms-openai-like>=0.3.0
Requires-Dist: openai<3,>=2.0.0
Description-Content-Type: text/markdown

# llamaindex-doubleword

A LlamaIndex integration package for [Doubleword](https://doubleword.ai).

This package wires Doubleword's OpenAI-compatible inference API
(`https://api.doubleword.ai/v1`) into LlamaIndex as both **real-time**
LLM / embedding models and **transparently-batched** variants powered by
[`autobatcher`](https://pypi.org/project/autobatcher/).

The batched variants are required to access models that Doubleword exposes
**only via the batch API**, and they cut cost on workloads that fan out
many concurrent calls — typically the case in agentic workflows.

## Installation

```bash
pip install llamaindex-doubleword
```

## Authentication

Three resolution paths, in precedence order:

1. **Explicit constructor argument**:
   ```python
   DoublewordLLM(model="...", api_key="sk-...")
   ```
2. **Environment variable**:
   ```bash
   export DOUBLEWORD_API_KEY=sk-...
   ```
3. **`~/.dw/credentials.toml`** — the same file written by Doubleword's CLI
   tooling. The active account is selected by `~/.dw/config.toml`'s
   `active_account` field, and `inference_key` from that account is used.

   ```toml
   # ~/.dw/config.toml
   active_account = "work"
   ```
   ```toml
   # ~/.dw/credentials.toml
   [accounts.work]
   inference_key = "sk-..."
   ```

   To use a non-active account from your credentials file, set
   `DOUBLEWORD_API_KEY` directly to that account's `inference_key` — there
   is no `account=` selector on the model itself.

## LLMs

### `DoublewordLLM` (real-time)

Drop-in LLM for any LlamaIndex workflow that expects an `LLM`.

```python
from llamaindex_doubleword import DoublewordLLM

llm = DoublewordLLM(model="your-model-name")

response = llm.complete("Explain bismuth in three sentences.")
print(response.text)
```

Tool calling is supported — use with LlamaIndex's agent framework:

```python
from llama_index.core.agent.workflow import AgentWorkflow
from llama_index.core.tools import FunctionTool
from llamaindex_doubleword import DoublewordLLM

def calculator(expression: str) -> str:
    """Evaluate a basic arithmetic expression."""
    return str(eval(expression, {"__builtins__": {}}, {}))

llm = DoublewordLLM(model="your-model-name")
agent = AgentWorkflow.from_tools_or_functions(
    [FunctionTool.from_defaults(fn=calculator)],
    llm=llm,
)

response = agent.run("What is 137 * 49?")
print(response)
```

### `DoublewordLLMBatch` (transparently batched)

Same interface, but every concurrent `.acomplete()` / `.achat()` call is
collected by `autobatcher` and submitted via Doubleword's batch endpoint.
**Async-only** — sync calls raise.

Use this when:

- The model you want is **batch-only** (some Doubleword-hosted models do not
  expose a real-time chat endpoint).
- You're running an agentic workflow with parallel branches and want
  ~50% cost savings via batch pricing.

```python
import asyncio
from llamaindex_doubleword import DoublewordLLMBatch

llm = DoublewordLLMBatch(model="batch-only-model")

async def main():
    # Concurrent calls collected into a single batch under the hood.
    results = await asyncio.gather(*[
        llm.acomplete(f"Summarize chapter {i}") for i in range(50)
    ])
    for r in results:
        print(r.text)

asyncio.run(main())
```

#### Tuning autobatcher

Four `autobatcher.BatchOpenAI` knobs are exposed as constructor arguments:

| Argument                | Default | Purpose                                                              |
|-------------------------|---------|----------------------------------------------------------------------|
| `batch_size`            | `1000`  | Submit a batch when this many requests are queued.                   |
| `batch_window_seconds`  | `10.0`  | Submit a batch after this many seconds even if the size cap is not reached. |
| `poll_interval_seconds` | `5.0`   | How often autobatcher polls for batch completion.                    |
| `completion_window`     | `"24h"` | Doubleword batch completion window. `"1h"` is more expensive but faster. |

```python
llm = DoublewordLLMBatch(
    model="your-model",
    batch_size=250,           # smaller batches for fast-turnaround nodes
    batch_window_seconds=2.5, # don't make latency-sensitive calls wait 10s
    completion_window="1h",   # pay more, finish quicker
)
```

The same arguments are available on `DoublewordEmbeddingBatch`.

### `DoublewordLLMAsync` (1-hour flex tier)

A thin subclass of `DoublewordLLMBatch` pinned to Doubleword's **flex
(1-hour)** completion window. Backed by `autobatcher.AsyncOpenAI` rather
than `BatchOpenAI`. Use this when 24-hour batch turnaround is too slow but
realtime cost is too high — typical for fan-out workflows that need results
within minutes-to-an-hour.

```python
import asyncio
from llamaindex_doubleword import DoublewordLLMAsync

llm = DoublewordLLMAsync(model="your-model")  # completion_window="1h" by default

async def main():
    results = await asyncio.gather(*[
        llm.acomplete(f"Summarize chapter {i}") for i in range(50)
    ])
    for r in results:
        print(r.text)

asyncio.run(main())
```

All the autobatcher tuning knobs above apply unchanged. The only difference
from `DoublewordLLMBatch` is the default `completion_window` (`"1h"` vs
`"24h"`); the same `DoublewordEmbeddingAsync` exists on the embeddings side.

## Embeddings

```python
from llamaindex_doubleword import (
    DoublewordEmbedding,
    DoublewordEmbeddingAsync,
    DoublewordEmbeddingBatch,
)

embed = DoublewordEmbedding(model_name="your-embedding-model")
vec = embed.get_text_embedding("hello world")

# Or, transparently batched (24h tier):
batch_embed = DoublewordEmbeddingBatch(model_name="your-embedding-model")
# vecs = await batch_embed.aget_text_embedding_batch([...])

# Or on the 1h flex tier:
async_embed = DoublewordEmbeddingAsync(model_name="your-embedding-model")
# vecs = await async_embed.aget_text_embedding_batch([...])
```

## Use with LlamaIndex

`DoublewordLLM` and `DoublewordEmbedding` work with LlamaIndex's global
`Settings`:

```python
from llama_index.core import Settings, VectorStoreIndex

Settings.llm = DoublewordLLM(model="your-model")
Settings.embed_model = DoublewordEmbedding(model_name="your-embedding-model")

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("What is this about?")
```

## Configuration

| Argument    | Env var              | Default                          |
|-------------|----------------------|----------------------------------|
| `api_key`   | `DOUBLEWORD_API_KEY` | _required_                       |
| `api_base`  | `DOUBLEWORD_API_BASE`| `https://api.doubleword.ai/v1`   |
| `model`     | —                    | _required_                       |

All other arguments accepted by `llama_index.llms.openai_like.OpenAILike` are
forwarded unchanged (`temperature`, `max_tokens`, `timeout`, etc.).

## License

MIT
