Metadata-Version: 2.4
Name: chapek-embedder
Version: 0.1.0
Summary: Local bge-small-en-v1.5 embeddings, bundled offline, no Hugging Face runtime dependency.
Project-URL: Homepage, https://chapek.ai
Project-URL: Source, https://github.com/chapek-ai/chapek-embedder
Author: Chapek Project
License: MIT
License-File: LICENSE
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.10
Requires-Dist: numpy>=1.25
Requires-Dist: onnxruntime>=1.15
Requires-Dist: tokenizers>=0.13
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == 'test'
Description-Content-Type: text/markdown

# chapek-embedder

Local `bge-small-en-v1.5` embeddings, bundled offline, no Hugging Face runtime dependency.

## Install

```bash
pip install chapek-embedder
```

## Usage

```python
from chapek_embedder import embed, MODEL_VERSION

vectors = embed(["text one", "text two"])
print(vectors.shape)  # (2, 384)
print(MODEL_VERSION)  # bge-small-en-v1.5-cls
```

## Output

- `numpy.ndarray` of shape `(n, 384)`
- `dtype` is `float32`
- CLS pooled from the model's last hidden state
- L2-normalized per vector
- Inputs longer than 512 tokens are truncated cleanly

## Compatibility

This package is designed to match Cloudflare Workers AI's `@cf/baai/bge-small-en-v1.5` with pooling set to `"cls"`.

## Development

Before building a wheel, download the bundled model resources:

```bash
python scripts/download_model.py
```

Then install for development:

```bash
pip install -e .
```

Run tests with:

```bash
pytest
```

## Model version and license

- Model version: `bge-small-en-v1.5-cls`
- License: MIT (BAAI weights/license)

## Project

This package is part of the Chapek project (`chapek.ai`) and is published separately so it can be reused in constrained environments.
