Metadata-Version: 2.4
Name: ramjetio
Version: 0.11.6
Summary: Distributed peer cache for ML training data — works with any framework on any S3-compatible store
Author-email: RAMJET <support@ramjet.io>
License-Expression: LicenseRef-PolyForm-Noncommercial-1.0.0
Project-URL: Homepage, https://ramjet.io
Project-URL: Documentation, https://docs.ramjet.io
Project-URL: Repository, https://github.com/jogrms/ramjetio
Project-URL: Issues, https://github.com/jogrms/ramjetio/issues
Project-URL: Changelog, https://github.com/jogrms/ramjetio/blob/main/CHANGELOG.md
Keywords: distributed,cache,peer-to-peer,pytorch,tensorflow,huggingface,s3,minio,deep-learning,machine-learning,training,dataset
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Operating System :: POSIX :: Linux
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.9.0
Requires-Dist: numpy>=1.19.0
Requires-Dist: grpcio>=1.50.0
Requires-Dist: grpcio-tools>=1.50.0
Requires-Dist: protobuf>=4.0.0
Requires-Dist: aiohttp>=3.8.0
Requires-Dist: msgpack>=1.0.0
Requires-Dist: pyyaml>=5.4.0
Requires-Dist: requests>=2.25.0
Requires-Dist: httpx[http2]>=0.27.0
Requires-Dist: mmh3>=3.0.0
Requires-Dist: diskcache>=5.4.0
Requires-Dist: psutil>=5.8.0
Requires-Dist: boto3>=1.26.0
Requires-Dist: pynvml>=11.0.0
Requires-Dist: lmdb>=1.4.0
Provides-Extra: gpu
Provides-Extra: ultralytics
Requires-Dist: ultralytics<8.5,>=8.4; extra == "ultralytics"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "test"
Requires-Dist: hypothesis>=6.0; extra == "test"
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.20.0; extra == "otel"
Requires-Dist: opentelemetry-sdk>=1.20.0; extra == "otel"
Requires-Dist: opentelemetry-exporter-otlp-proto-grpc>=1.20.0; extra == "otel"
Dynamic: license-file

# RAMJET — Distributed peer cache for ML training data

[![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PyPI](https://img.shields.io/pypi/v/ramjetio.svg)](https://pypi.org/project/ramjetio/)
[![License](https://img.shields.io/badge/License-PolyForm%20NC-green.svg)](LICENSE)

**RAMJET** is a peer-to-peer cache between any ML training job and any S3-compatible object store. The first run pulls data from S3 once and seeds the cluster; subsequent runs across the team serve every sample peer-to-peer with zero S3 calls.

Works with any framework that loads data from S3/HTTP/local paths — PyTorch, TensorFlow, JAX, HuggingFace Datasets, Ultralytics, custom loaders — and any DDP launcher (`torchrun`, DeepSpeed, Accelerate, SLURM).

Measured on a real 2× A5000 cluster (5315 samples, 5 epochs): **6.5× faster export when cache is warm, zero S3 requests after the first run.**

## Why RAMJET?

| Problem | Solution |
|---------|----------|
| Repeated S3 pulls across team experiments | One pull seeds the cluster; the rest serve peer-to-peer |
| Network bottleneck from shared object storage | Local SSD cache on every training node |
| No visibility into data-loading bottlenecks | Live dashboard with per-node bytes-by-source split (local / peer / S3) and Grafana-native Prometheus metrics |
| Multi-node DDP coordination | Auto-detect rank/world size from `torchrun`/SLURM env |

## Quick Start

### 1. Install

```bash
pip install ramjetio
```

### 2. Add to Your Training Script

```python
import ramjetio
from torch.utils.data import DataLoader

ramjetio.init()

dataset = ramjetio.CachedDataset(your_dataset)
loader = DataLoader(dataset, batch_size=32)

for batch in loader:
    train_step(batch)
```

### 3. Run

Get your API key from [app.ramjet.io](https://app.ramjet.io) (create a cluster → copy key).

```bash
export RAMJET_API_KEY="your_api_key_here"
python train.py
```

Multi-GPU: `torchrun --nproc_per_node=N train.py`

That's it! Your nodes will appear in the dashboard within seconds.

## How It Works

```
   ┌──────────┐   ┌──────────┐   ┌──────────┐
   │  Node 0  │   │  Node 1  │   │  Node 2  │
   │  train   │   │  train   │   │  train   │
   │    │     │   │    │     │   │    │     │
   │    ▼     │   │    ▼     │   │    ▼     │
   │ ramjetio │◄─►│ ramjetio │◄─►│ ramjetio │
   │  cache   │   │  cache   │   │  cache   │
   │ NVMe SSD │   │ NVMe SSD │   │ NVMe SSD │
   └─────┬────┘   └─────┬────┘   └─────┬────┘
         └──────────────┼──────────────┘
                        ▼
                ┌───────────────┐
                │ S3 / MinIO /  │
                │ R2 / GCS / …  │
                └───────────────┘

   Hits stay local or hop to a peer (sub-ms over LAN).
   Only the first miss in the cluster ever touches object storage.
```

## Features

- 🚀 **Zero-config caching** — `ramjetio.init()` handles everything
- 📊 **Real-time dashboard** — monitor cache hits, throughput, GPU utilization
- 🔄 **Consistent hashing** — data distributed evenly across nodes
- 💾 **Disk-backed cache** — survives restarts, uses NVMe SSDs efficiently
- 🔌 **Works with any setup** — torchrun, DeepSpeed, Accelerate, custom launchers
- ☁️ **S3/MinIO integration** — configure data source in dashboard, not in code

## Integration Examples

Runnable scripts in [`examples/`](examples/):

- [`simple.py`](examples/simple.py) — minimal end-to-end starting point
- [`pytorch_imagenet.py`](examples/pytorch_imagenet.py) — ImageFolder + ResNet18 + `CachedDataset`
- [`huggingface_datasets.py`](examples/huggingface_datasets.py) — `datasets.load_dataset(...)` wrapped with cache
- [`yolov8.py`](examples/yolov8.py) — Ultralytics YOLOv8 via `UniversalDataset(format='yolo')`
- [`torchrun_ddp.py`](examples/torchrun_ddp.py) — multi-node DDP under `torchrun`
- [`accelerate_example.py`](examples/accelerate_example.py) — HuggingFace Accelerate
- [`deepspeed_example.py`](examples/deepspeed_example.py) — DeepSpeed launcher
- [`ddp_torchrun.sh`](examples/ddp_torchrun.sh) — one-liner `torchrun` wrapper (auto-detects GPUs)

See [docs/INTEGRATION.md](docs/INTEGRATION.md) for deeper walkthroughs.

## Configuration

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `RAMJET_API_KEY` | Your API key (required) | — |
| `RAMJET_CACHE_PATH` | Local cache directory | `/tmp/ramjet_cache` |
| `RAMJET_CACHE_SIZE` | Max cache size | `100GB` |
| `RAMJET_PORT` | Cache server port | `9000` |
| `RAMJET_STORE` | Local store backend: `diskcache` (default) or `lmdb` (opt-in, ~15× faster GET p50 on warm reads; see [docs/STORE.md](docs/STORE.md)) | `diskcache` |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP collector endpoint. When set (and `ramjetio[otel]` is installed), enables distributed tracing across SDK → peer → backend. See [docs/TRACING.md](docs/TRACING.md). | (unset, tracing disabled) |
| `RAMJET_TRACE_SAMPLE` | Trace sampling ratio (`0.0`–`1.0`). Only meaningful when tracing is enabled. | `1.0` |

### Dashboard Settings

Configure in the web dashboard (no code changes needed):
- **Data Source**: S3/MinIO endpoint, bucket, credentials
- **Cache Settings**: TTL, replication factor, eviction policy

## Distributed Training (DDP)

RAMJET automatically detects `torchrun` and DDP environments:

### Single Machine, Multiple GPUs (torchrun)

```bash
# 4 GPUs on one machine
torchrun --nproc_per_node=4 train.py
```

```python
import ramjetio
import torch.distributed as dist

# Only LOCAL_RANK=0 starts cache server - others wait and share it
ramjetio.init()

# All ranks use the same cache
dataset = ramjetio.CachedDataset(your_dataset)
```

### Multi-Node Training

RAMJET auto-detects your cluster manager — no manual configuration needed:

| Environment | How to launch | RAMJET detects it? |
|-------------|--------------|--------------------|
| **SLURM** | `srun python train.py` | ✅ Automatic |
| **Kubernetes** (PyTorchJob) | Managed by operator | ✅ Automatic |
| **DeepSpeed** | `deepspeed --hostfile hosts train.py` | ✅ Automatic |
| **Accelerate** | `accelerate launch train.py` | ✅ Automatic |
| **torchrun** | `torchrun --nproc_per_node=N train.py` | ✅ Automatic |
| **SageMaker** | Configured in SageMaker console | ✅ Automatic |

Each node runs one cache server (on `LOCAL_RANK=0`), and all nodes share data via consistent hashing.
RAMJET reads `LOCAL_RANK`, `RANK`, `WORLD_SIZE` from environment — every major launcher sets these automatically.

## CLI Tools

```bash
# Start cache server manually (usually not needed — ramjetio.init() does this)
ramjetio-server --port 9000 --capacity 100GB

# Check cache status
ramjetio-client stats

# Clear cache
ramjetio-client clear
```

## Requirements

- Python 3.8+
- PyTorch 1.9+
- Linux (recommended for production)
- SSD storage for cache (recommended)

## Documentation

- [Integration Guide](docs/INTEGRATION.md) — detailed examples for all frameworks
- [API Reference](docs/API.md) — full API documentation
- [Troubleshooting](docs/TROUBLESHOOTING.md) — common issues and solutions

## License

PolyForm Noncommercial License 1.0.0 — free for personal and non-commercial use.
For commercial licensing, contact licensing@ramjet.dev. See [LICENSE](LICENSE) for details.

## Support

- 📧 Email: support@ramjet.io
- 💬 Discord: [discord.gg/ramjet](https://discord.gg/ramjet)
- 📖 Docs: [docs.ramjet.io](https://docs.ramjet.io)
