Metadata-Version: 2.4
Name: memkv-sglang
Version: 1.0.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Rust
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: pytest >=7 ; extra == 'test'
Requires-Dist: torch >=2.1 ; extra == 'test'
Provides-Extra: test
Summary: sglang HiCacheStorage backend for the MemKV context memory store
Author: MinIO, Inc.
License: LicenseRef-Proprietary
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/miniohq/memkv
Project-URL: Documentation, https://docs.min.io/memkv/

# memkv-sglang

sglang `HiCacheStorage` backend that persists prefix KV pages in a
remote MemKV cluster. Loaded as a vendor plugin via sglang's built-in
`dynamic` storage backend dispatch — no patches to sglang's tree.

## Build

```bash
cd sglang-plugin
pip install maturin
maturin develop --release      # local dev install
# or
maturin build --release        # wheel under target/wheels/
pip install target/wheels/memkv_sglang-*.whl
```

The wheel bundles a native PyO3 extension built from the same
`memkv-client` crate the NIXL plugin uses, so RDMA/TCP transport
selection works the same way.

## Configure the MemKV connection

The plugin reads the standard MemKV config chain — `MEMKV_CONFIG`
yaml first, then `MEMKV_*` env vars:

```bash
export MEMKV_SERVERS="10.0.0.10:9900,10.0.0.11:9900"
export MEMKV_RDMA_DEVICES="mlx5_0,mlx5_1"
export MEMKV_AUTH_KEY="<64-hex>"
# optional:
# export MEMKV_TRANSPORT=auto       # rdma | tcp | auto (default)
# export MEMKV_CONFIG=/etc/memkv.yaml
```

## Launch sglang against MemKV

Use sglang's `dynamic` storage backend, pointing it at the class in
this package:

```bash
python -m sglang.launch_server \
    --model-path meta-llama/Llama-3-8B \
    --enable-hierarchical-cache \
    --hicache-storage-backend dynamic \
    --hicache-storage-backend-extra-config '{
      "backend_name": "memkv",
      "module_path": "memkv_sglang.backend",
      "class_name": "MemKVHiCacheStorage"
    }'
```

sglang's `StorageBackendFactory._create_dynamic_backend` imports the
class and constructs it as `MemKVHiCacheStorage(storage_config, kwargs)`.

## What's implemented

| Method                    | Status                                                                                                           |
| ------------------------- | ---------------------------------------------------------------------------------------------------------------- |
| `get` / `batch_get`       | yes; RDMA zero-copy direct into the target tensor when eligible (Linux + CPU + contiguous), bytes path otherwise |
| `set` / `batch_set`       | yes                                                                                                              |
| `exists` / `batch_exists` | yes (router-aware, one batched RPC per server)                                                                   |
| `batch_exists_v2`         | yes (per-pool hit policies)                                                                                      |
| `batch_get_v2`            | yes; RDMA zero-copy into the dummy flat page when eligible                                                       |
| `batch_set_v2`            | yes                                                                                                              |
| `clear`                   | no-op (server manages retention)                                                                                 |

## Layout

```
sglang-plugin/
├── Cargo.toml                   # cdylib + pyo3 + memkv-client
├── pyproject.toml               # maturin
├── src/lib.rs                   # PyO3 wrapper around memkv-client::Engine
└── python/memkv_sglang/
    ├── __init__.py              # re-exports Client
    └── backend.py               # MemKVHiCacheStorage(HiCacheStorage)
```

