Metadata-Version: 2.4
Name: colbert-server
Version: 0.3.1
Summary: Run ColBERT Wikipedia server
Keywords: colbert,retrieval,search,wikipedia,ir
Author: Niels van Galen Last
Author-email: Niels van Galen Last <nvangalenlast@gmail.com>
License-Expression: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Requires-Dist: colbert-ai>=0.2.22
Requires-Dist: faiss-cpu>=1.12.0
Requires-Dist: flask>=3.1.2
Requires-Dist: huggingface-hub>=0.36.0
Requires-Dist: packaging>=24.1
Requires-Dist: torch>=2.9.0
Requires-Python: >=3.13
Project-URL: Dataset, https://huggingface.co/datasets/nielsgl/colbert-wiki2017
Project-URL: Homepage, https://github.com/nielsgl/colbert-server
Project-URL: Issues, https://github.com/nielsgl/colbert-server/issues
Project-URL: Repository, https://github.com/nielsgl/colbert-server
Description-Content-Type: text/markdown

# colbert-server

[![uv tool](https://img.shields.io/badge/uv-tool-3b82f6?logo=uv&logoColor=white)](https://docs.astral.sh/uv/)
[![python 3.13+](https://img.shields.io/badge/python-3.13+-3776AB?logo=python&logoColor=white)](https://www.python.org/)
[![license MIT](https://img.shields.io/badge/license-MIT-7c3aed.svg)](LICENSE)
[![PyPI](https://img.shields.io/pypi/v/colbert-server.svg?color=3776AB&label=pypi)](https://pypi.org/project/colbert-server/)
[![Downloads](https://img.shields.io/pypi/dm/colbert-server.svg?color=8b5cf6&label=downloads)](https://pypi.org/project/colbert-server/)
[![CI](https://github.com/nielsgl/colbert-server/actions/workflows/ci.yml/badge.svg)](https://github.com/nielsgl/colbert-server/actions/workflows/ci.yml)
[![dataset](https://img.shields.io/badge/dataset-huggingface-ff9a00?logo=huggingface&logoColor=white)](https://huggingface.co/datasets/nielsgl/colbert-wiki2017)
[![API Flask](https://img.shields.io/badge/api-flask-000000?logo=flask&logoColor=white)](https://flask.palletsprojects.com/)

CLI tooling to fetch the ColBERT Wikipedia 2017 dataset and run a lightweight Flask API on top of the ColBERT v2 searcher.

```text
$> I wrote this because the ColBERT server is down and I couldn't try one of the tutorial from DSPy.
$> I only tested this on my macbook, please open an issue if you have problems or feature requests.
```

## Features

- One-command install and execution via `uv tool`.
- Automatically downloads either ready-to-serve indexes/collection or the original archives.
- Optional archive extraction flow for offline usage.
- Caches ColBERT queries for fast, repeated lookups.
- Exposes a simple `/api/search` endpoint for programmatic access.
- Ships with a `doctor` command that validates Torch/FAISS installs before the 13 GB download.

## Installation

```bash
uv tool install colbert-server
```

This registers a `colbert-server` executable in your `uv` toolchain.

Check the installed version at any time:

```bash
colbert-server --version
```

Run the built-in diagnostic command to verify torch/faiss before downloading the index:

```bash
colbert-server doctor
```

Or if you just want to run it:

```bash
uvx run colbert-server --help
```

## Running the server

### Use data from the Hugging Face cache (recommended quick start)

```bash
colbert-server serve --from-cache
```

This downloads only the `collection/` and `indexes/` folders from
[`nielsgl/colbert-wiki2017`](https://huggingface.co/datasets/nielsgl/colbert-wiki2017),
resolves the on-disk paths from the Hugging Face cache, and starts the server.

### Provide existing local assets

```bash
colbert-server serve \
  --index-root /path/to/indexes \
  --index-name wiki17.nbits.local \
  --collection-path /path/to/collection/wiki.abstracts.2017/collection.tsv
```

Use this mode when you already have ColBERT indexes and a collection TSV locally.

### Download archives first, then serve

```bash
colbert-server serve \
  --download-archives /tmp/wiki-assets \
  --extract \
  --port 8894
```

This fetches the archive files into `/tmp/wiki-assets/archives`, extracts them into
`/tmp/wiki-assets`, auto-detects the resulting layout (e.g. `wiki17.nbits.local`),
and starts the Flask server on port `8894`.

## API usage

Once running, the server listens on the host/port provided (defaults to `0.0.0.0:8893`)
and serves ColBERT search results via:

```
GET /api/search?query=<text>&k=<top-k>
```

Example request:

```
http://127.0.0.1:8893/api/search?query=halloween+movie&k=3
```

The JSON response includes the ranked passages, their scores, and normalized probabilities.

## Managing dataset archives only

If you just want the raw archive bundles in a local directory:

```bash
colbert-server download-archives ./downloads --extract
```

Add `--extract-to /desired/path` to unpack into a different directory. You can later reuse
the extracted paths with the `serve` command’s `--index-root` and `--collection-path` flags.

## Alternative / Manual Method

In case you don't want to use the script / `uv` tool you can set it up as follows:

1. Add the dependencies to your project: `uv add colbert-ai flask faiss-cpu` and
   `uv add torch --index pytorch-cpu` (the repo already provides the PyTorch CPU
   index in `pyproject.toml`).
2. Download the files (both the index and the collection) from the `archives` directory from the HuggingFace dataset and unzip them.
3. Copy the `standalone.py` script from this repository and edit the `INDEX_ROOT` and `COLLECTION_PATH` variables.
4. Run the server with `uv run standalone.py` and <tada.wav>

## Development tips

- Requires Python 3.13+ (or adjust the `pyproject.toml` requirement to match your interpreter).
- Run `colbert-server --help` or `colbert-server serve --help` to inspect available options.
- The dataset helpers live under `colbert_server/data.py`; server configuration sits in `colbert_server/server.py`.
- GitHub Actions runs lint/tests on every push across Ubuntu, macOS, and Windows; see `.github/workflows/ci.yml` for details.
- Publishing is handled via the `Manual Release` workflow. Trigger it from the Actions tab, choose the desired bump (patch/minor/major), and it will bump the version, run tests/builds, push the commit, and create the `vX.Y.Z` tag. Create a `RELEASE_TOKEN` secret with `repo` scope so the workflow can push back to `main`.
- The CLI pings PyPI at most once per day and nudges you if a newer version exists. Set `COLBERT_SERVER_DISABLE_UPDATE_CHECK=1` to disable this behaviour.
- The tag automatically kicks off the `Publish` workflow, which uploads the artifacts to PyPI via trusted publishing. After that completes the `Verify Release` workflow installs the new version on Ubuntu, macOS, and Windows and runs `colbert-server --version`/`doctor` to ensure the artifact works cross-platform.

Happy searching! 🧠📚
