Metadata-Version: 2.4
Name: llama-index-embeddings-forge
Version: 0.1.0
Summary: LlamaIndex embeddings for Forge — Voxell's text-embedding API (turbo/pro/ultra; ultra = Qwen3-Embedding-8B, ~75+ avg MTEB, #4 English).
Project-URL: Homepage, https://voxell.ai/forge
Project-URL: Repository, https://github.com/VoxellInc/llama-index-embeddings-forge
Project-URL: Issues, https://github.com/VoxellInc/llama-index-embeddings-forge/issues
Author: Voxell, Inc.
License: MIT
License-File: LICENSE
Keywords: embeddings,forge,llama-index,llamaindex,qwen3,rag,semantic-search,voxell
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.9
Requires-Dist: httpx>=0.27.0
Requires-Dist: llama-index-core>=0.11.0
Description-Content-Type: text/markdown

# llama-index-embeddings-forge

LlamaIndex embeddings for [**Forge**](https://voxell.ai/forge) — Voxell's hosted text-embedding API.

## Why Forge

One API, three tiers — pick your point on the quality/cost curve:

| Model | Dim | Notes |
| ----- | --- | ----- |
| `turbo` | 1024 | fast, low cost |
| `pro` | 2560 | |
| `ultra` | 4096 | Qwen3-Embedding-8B; ~75+ avg task score on MTEB, currently #4 on MTEB (English) — the top *usable* model (the three above are research-only) |

Matryoshka (MRL) dimensions are real: truncated vectors are re-normalized, so a shorter `dim` is a
unit-norm prefix of the full vector — smaller index, minimal quality loss. Forge logs request
metadata only (model, tokens, latency) — never your text or vectors.

## Install

```bash
pip install llama-index-embeddings-forge
```

## Usage

```python
from llama_index.embeddings.forge import ForgeEmbedding

# FORGE_API_KEY is read from the environment; or pass api_key=...
embed_model = ForgeEmbedding(model="turbo")

vector = embed_model.get_text_embedding("the quick brown fox")
query = embed_model.get_query_embedding("fast animal")
batch = embed_model.get_text_embedding_batch(["doc one", "doc two"])
```

### As the global embed model

```python
from llama_index.core import Settings, VectorStoreIndex, Document
from llama_index.embeddings.forge import ForgeEmbedding

Settings.embed_model = ForgeEmbedding(model="pro")
index = VectorStoreIndex.from_documents([Document(text="hello world")])
```

### Async

```python
vec = await embed_model.aget_text_embedding("doc")
q = await embed_model.aget_query_embedding("a search query")
```

### Matryoshka (shorter vectors)

```python
embed_model = ForgeEmbedding(model="turbo", dimensions=256)  # re-normalized 256-d vectors
```

## Configuration

| Arg | Default | Notes |
| --- | ------- | ----- |
| `model` | `"turbo"` | `turbo` \| `pro` \| `ultra` (stored as `model_name`) |
| `api_key` | `FORGE_API_KEY` env | get one at [dash.voxell.ai](https://dash.voxell.ai) |
| `base_url` | `https://api.voxell.ai` | |
| `dimensions` | `None` | Matryoshka truncation, e.g. `256` |
| `timeout` | `30.0` | seconds |

## License

MIT © Voxell, Inc.
