Metadata-Version: 2.4
Name: redhare
Version: 0.4.1rc0
Summary: Stable, high-performance KVCache for LLM inference, with decentralized coordination, memory-pool storage, and multi-tier caching across hosts.
Author: Redhare contributors
License: Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM

# redhare (Python connector)

In-process Python binding around `RedhareClient`. Ships two API layers
that share the same embedded Rust client:

- **`KvStore`** — low-level SDK (this README).
- **`redhare.vllm_connector.RedhareConnector`** — vLLM v1 native KV
  connector. Full schema and mode-picking guide:
  [`docs/08-vllm-connector.md`](../../docs/08-vllm-connector.md).

The low-level SDK below exposes batch gather/scatter methods for custom
connector implementations:

```python
from redhare import KvStore

store = KvStore(
    redis_url="redis://127.0.0.1:6379",
    client_id="rank0",
    data_plane_addr="127.0.0.1:7000",
    capacity="64MB",              # int bytes also accepted
    transport="nixl",            # "nixl" (default) | "tcp" | "inmemory"
    shared_fs_root=None,          # optional cold-tier path
    shared_fs_backend="posix",    # "posix" (default) | "gds"
    block_size_bytes=None,        # optional fixed per-key block size
    hot_cache_fraction=0.20,
    hot_cache_min_shared_fs_reads=2,
    cache_node_rpc_addr=None,     # optional: enable cache-node mode
    enable_remote_dram=False,     # originator: place into discovered cache-nodes
)

# Pre-register the KV cache region once; later reads DMA straight into it
# without re-registering per call.
store.register_buffer(kv_cache.data_ptr(), kv_cache.numel() * kv_cache.element_size())

# Save: each key's payload is gathered from N (addr, size) pairs (one memcpy).
rc = store.batch_put_from_multi_buffers(keys, addrs_per_key, sizes_per_key)
# Load: NIXL scatters the payload across the same N (addr, size) pairs.
# True zero-copy when destinations live inside a register_buffer region.
rc = store.batch_get_into_multi_buffers(keys, addrs_per_key, sizes_per_key)

# Experimental local-only load: enqueue the scatter copy on Redhare's CUDA
# copy stream and poll or wait for completion. This raises if any key is not
# local, so callers should fall back to batch_get_into_multi_buffers.
handle = store.batch_get_into_multi_buffers_submit_local(
    keys, addrs_per_key, sizes_per_key
)
# Either poll handle.is_done() from your scheduler loop, or block explicitly.
handle.wait()

# Existence check (1 = exists, 0 = missing, -1 = error per key)
rc = store.batch_is_exist(keys)

store.close()
```

Return convention uses per-key `int` lists where `0` is success and a negative
value indicates failure.

## Build

```bash
pip install maturin                          # once
cd crates/redhare-py
export CARGO_TARGET_DIR=/tmp/redhare-target   # NFS-safe (see project memory)
maturin develop --release                    # installs into the active env
```

A wheel is built with `maturin build --release`; install via `pip install
target/wheels/redhare-*.whl`.

## Notes

- `transport="nixl"` is the only one with true zero-copy load. `tcp` and
  `inmemory` work but fall back to read-then-memcpy for scatters.
- `batch_get_into_multi_buffers_submit_local()` is a local-only optimization
  for custom connectors. It does not read from remote peers or cold storage;
  use `batch_get_into_multi_buffers()` as the fallback path.
- `data_plane_addr` is what peers connect to (NIXL control channel or TCP).
  Must be reachable from other clients in the cluster.
- Remote DRAM uses `data_plane_addr` for payload movement and
  `cache_node_rpc_addr` only for control. With `transport="nixl"`, remote
  writes use NIXL Write into the cache-node arena after an RPC reservation.
- One connector `key` maps to one Redhare object: the per-key `(addrs, sizes)` lists
  are concatenated on save (one memcpy) and scattered on load (zero copy).
- With `block_size_bytes=...`, each key is forced to one fixed-size block:
  shorter saves are zero-padded, larger saves fail.

