Metadata-Version: 2.4
Name: irohds
Version: 0.3.11
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Rust
Classifier: Topic :: System :: Distributed Computing
Requires-Dist: blake3>=0.3
Requires-Dist: coren>=0.1
Summary: Decentralized function memoization over iroh P2P
Keywords: memoization,p2p,iroh,decentralized,cache
License-Expression: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Repository, https://codeberg.org/cab-lab/irohds

# irohds

A drop-in Python decorator that caches function results and shares them
automatically across every machine running the same code. No servers to
manage, no configuration, no accounts.

If someone at another institution already computed `train_model("cifar10",
epochs=50)`, your machine downloads the result instead of spending hours
recomputing it. If nobody has computed it yet, your machine does the work
and makes the result available to everyone else.

```python
import irohds

@irohds.memo
def train_model(dataset, epochs=10):
    ...  # hours of GPU time
    return model

result = train_model("cifar10", epochs=50)
# First run: computes (hours). Every subsequent run, on any peer: instant.
```

## Who is this for

Research groups and institutions that repeatedly run expensive
computations across many machines. If your lab has 20 people who all
run the same preprocessing pipeline on the same datasets, irohds means
only the first person waits. Everyone else gets the result in seconds.

Works across institutions, across continents, across networks. Peers
find each other through the BitTorrent mainline DHT (16M+ nodes). No
central server, no coordinator, no shared filesystem required.

## What it is not for

Functions that complete in under 15 seconds. The network overhead of
sharing results only pays off for genuinely expensive computations.
For fast functions, use `functools.cache`, `joblib.Memory`, or
`diskcache`. irohds will warn you if a decorated function is too cheap
to benefit from network sharing.

## Install

```sh
uv add irohds
```

This installs the Python package, the Rust daemon binary, and the
[coren](https://codeberg.org/chaxor/coren) machine capability library.
The daemon starts automatically on first use and installs itself as a
system service (starts at boot, runs in a sandbox).

## Usage

```python
import irohds

# Basic: share results with all peers globally
@irohds.memo
def expensive_etl(dataset_path):
    ...
    return processed_data

# Namespaced: only share with peers using the same namespace
@irohds.memo(ns="my-lab")
def train(config):
    ...

# Large file outputs
@irohds.memo
def generate_embeddings(corpus):
    ...
    torch.save(embeddings, irohds.resolve("embeddings.pt"))
    return irohds.FileRef("embeddings.pt")

ref = generate_embeddings("pubmed-2024")
embeddings = torch.load(ref.path)  # file is on disk, ready to use

# Selective eviction
irohds.evict("mymodule.train")  # clear cached results for one function

# Pre-warm peer discovery (optional, reduces first-call latency)
irohds.join("my-lab")
```

## How it works

**On the first call:** irohds hashes the function's AST and arguments
into a cache key, executes the function, stores the result in a local
content-addressed blob store, and announces it to peers via gossip.

**On subsequent calls (same machine):** the result is returned from an
in-process dict (~0.1us) or from the local blob store via IPC (~0.2ms).
No network involved.

**On a different machine:** irohds checks whether any peer has the
result. If yes, it uses [coren](https://codeberg.org/chaxor/coren) to
decide whether downloading is faster than recomputing locally (based on
the function's compute cost and this machine's capabilities). Then it
either fetches the result or recomputes, whichever is faster.

**Peer discovery** is automatic via three mechanisms:
- Mainline DHT (global, zero config, 16M+ nodes)
- mDNS (automatic on LAN)
- Bootstrap peers (fallback for networks that block DHT)

**The daemon** (`irohds-daemon`) is a sandboxed Rust process that owns
the blob store and handles gossip/P2P networking. It installs as a
system service on first use. Python communicates with it over a Unix
socket. The sandbox ensures iroh network traffic cannot access the host
filesystem beyond the irohds data directory.

## Restricted networks

If mainline DHT is blocked (some universities, corporate networks),
add known peers to `~/.local/share/irohds/config.toml`:

```toml
bootstrap_peers = ["<hex-encoded-node-id>"]
```

Get a peer's node ID with `irohds-daemon info`.

## Performance

| Scenario | Latency |
|---|---|
| Repeated call, same process | ~0.1us (in-process dict) |
| First call after process start, data local | ~0.2ms (one IPC round-trip) |
| First call after daemon restart, data local | ~1ms (load index + IPC) |
| Result available from remote peer | seconds (network transfer) |
| Full miss, compute locally | depends on function |

## Developing

```sh
cargo build --manifest-path daemon/Cargo.toml  # build the daemon
make test                                       # Rust + Python tests
make test-vm                                    # NixOS QEMU P2P integration test
```

## License

MIT

