Metadata-Version: 2.3
Name: tei-serving
Version: 1.2.0
Summary: Config-driven launcher for Hugging Face Text Embeddings Inference services
Author: jalal
Author-email: jalal <jalalkhaldi3@gmail.com>
Requires-Dist: pydantic>=2.12.4,<3.0.0
Requires-Dist: pyyaml>=6.0.3,<7.0.0
Requires-Python: >=3.12, <3.13
Description-Content-Type: text/markdown

# tei-serving

`tei-serving` is a config-driven launcher for Hugging Face Text Embeddings Inference (TEI). It wraps the TEI router with a small Python entrypoint that reads YAML, validates settings, and starts TEI with the generated CLI arguments.

## What it does

Current capabilities:

1. Load an embedder or reranker YAML config.
2. Convert typed settings into TEI command-line arguments.
3. Start the TEI router process with the configured model unchanged.

Core files:

- [`src/tei_serving/__init__.py`](src/tei_serving/__init__.py): runner implementation.
- [`src/tei_serving/settings.py`](src/tei_serving/settings.py): config models and CLI serialization.
- [`src/tei_serving/main.py`](src/tei_serving/main.py): CLI entrypoint.

## Requirements

- Python `>=3.12,<3.13` for local development.
- Docker for the production image.
- A TEI base image containing `text-embeddings-router`.
- GPU runtime when serving CUDA TEI images.

## Installation

Production dependencies:

```bash
make install
```

Development environment:

```bash
make dev-install
```

Equivalent `uv` command:

```bash
uv sync --group dev --all-extras
```

## Configuration

Embedder example:

```yaml
kind: embedder

model:
  model-id: /models/testb
  dtype: float32
  pooling: cls

server:
  hostname: 0.0.0.0
  port: 8082

batching:
  max-batch-tokens: 16384
  max-client-batch-size: 32
```

Reranker example:

```yaml
kind: reranker

model:
  model-id: BAAI/bge-reranker-base
  dtype: float16

server:
  hostname: 0.0.0.0
  port: 8081

batching:
  max-batch-tokens: 16384
  max-client-batch-size: 32
```

## Running

Local CLI, assuming TEI is available on `PATH`:

```bash
tei-serving --config configs/embedder.yaml
```

Equivalent:

```bash
python -m tei_serving.main --config configs/embedder.yaml
```

Docker build:

```bash
docker build -t tei-serving:local .
```

Docker run with a mounted config:

```bash
docker run --gpus all --rm \
  -v /absolute/path/to/config.yaml:/config/config.yaml:ro \
  -v /absolute/path/to/models:/models:ro \
  tei-serving:local \
  --config /config/config.yaml
```

## Model Serving

The runner passes the configured `model-id` directly to TEI for both `embedder` and `reranker` configs. It does not copy local model directories, download Hugging Face snapshots, or rewrite SentenceTransformers metadata before startup.

## Development

```bash
make format
make lint
make type-check
make security
make test
make test-cov
make ci
```

## Repository Layout

```text
src/tei_serving/
  __init__.py      # runner implementation
  exceptions.py    # package exceptions
  main.py          # CLI entrypoint
  settings.py      # Pydantic settings and CLI serialization
configs/
  embedder.yaml    # example embedder config
  reranker.yaml    # example reranker config
tests/
  unit/            # unit tests for settings and runner behavior
```
