Metadata-Version: 2.4
Name: modelpulse
Version: 0.1.1
Summary: End-to-end partial-weight transfer pipeline.
Author-email: Mohammad Sufiyan <moahmmadsufiyan152@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/MdSufiyan005/ModelPulse
Project-URL: Source, https://github.com/MdSufiyan005/ModelPulse
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.27
Requires-Dist: typer>=0.12
Requires-Dist: fastapi>=0.111
Requires-Dist: uvicorn[standard]>=0.30
Requires-Dist: rich>=13.7
Requires-Dist: llama-cpp-python>=0.2.90
Requires-Dist: psutil>=5.9
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"

# ModelPulse 🚀

**End-to-end partial-weight transfer pipeline.**

Device A serves model shards → Device B reconstructs the GGUF in RAM and runs inference with **no persistent GGUF reconstruction on disk**.

```ascii
Device A                                      Device B
────────────────────                          ───────────────────────────────────
modelpulse server ./shards                   modelpulse bridge run http://100.101.102.103:8000
  │                                         │
  ├── GET /manifest  ─────────────────────► │  1. fetch manifest
  ├── GET /shards/*  ─────────────────────► │  2. pull all shards (streaming)
  │                                         │  3. assemble GGUF in RAM → /dev/shm
  │                                         │  4. llama.cpp loads from /dev/shm
  │                                         │  5. run inference, stream tokens
  └── POST /metrics  ◄────────────────────  │  6. send collected metrics
```

---

## 📸 Screenshots

**Server (Device A)**
![Modelpulse-Server](Images/server.png)

**Bridge (Device B)**
![Modelpulse-Bridge](Images/bridge.png)

**Inference in Progress**
![Running Inference](Images/running_bridge.png)

**Metrics Sent Back**
![Sending Benchmark data](Images/sending_metrics.png)

---

## 📦 Install
Install `ModelPulse` as a Python package directly from GitHub:
```bash
pip install git+https://github.com/MdSufiyan005/Model-Pulse.git
```


## 🔄 Workflow

### 1 — Prepare shards on Device A

Use `gguf_to_shards.py` from the companion tools to convert your GGUF model:

```bash
python tools/gguf_to_shards.py convert model.gguf ./shards/
```

### 2 — Start the server on server

```bash
modelpulse server run ./shards --host 0.0.0.0 --port 8000
```
### 3.0 - getting talescale ip
```bash
curl -fsSL https://tailscale.com/install.sh | sh

sudo tailscale up # signup on the page

tailscale ip # get the ip address - eg: 100.101.102.103

```


### 3.1 — Run inference on edge device

```bash
modelpulse bridge run http://100.101.102.103:8000
```

---

## 📋 Commands

### Device A (Server)

```bash
modelpulse server run <shards_dir> [options]
```

| Option           | Default         | Description              |
| ---------------- | --------------- | ------------------------ |
| `--port`         | `8000`          | Server port              |
| `--host`         | `0.0.0.0`       | Bind address             |
| `--metrics-log`  | `metrics.jsonl` | Metrics log file         |

### Device B (Client)

```bash
modelpulse bridge run <host> [options]
modelpulse bridge status <host> [--all]
```

| Command  | Description                          |
| -------- | ------------------------------------ |
| `run`    | Full pipeline: pull → infer → report |
| `status` | Display latest metrics from Device A |

#### Bridge `run` options

| Flag            | Default         | Description          |
| --------------- | --------------- | -------------------- |
| `--prompt / -p` | *(interactive)* | Prompt string        |
| `--max-tokens`  | `256`           | Tokens to generate   |
| `--temp / -t`   | `0.7`           | Sampling temperature |
| `--ctx`         | `2048`          | Context window       |
| `--no-report`   | `false`         | Skip sending metrics |

---

## 💾 Zero-Disk Strategy

```text
shard_data  ─── assemble_gguf_bytes() ──► gguf_bytes (RAM)
                                              │
                                       write_bytes()
                                              │
                                        /dev/shm/sb_<pid>.gguf   ← tmpfs, never touches physical disk
                                              │
                                       del gguf_bytes            ← Python bytes freed
                                              │
                                    Llama(model_path=...)        ← mmap from tmpfs
                                              │
                                       cleanup() → unlink()
```

This keeps the model file temporarly on the ram, while still satisfying llama.cpp’s file-path requirement.

The system prioritizes /dev/shm and /run/shm by checking for existence and write access, falling back to $TMPDIR or /tmp if no RAM-backed filesystem is available.

---

## 📁 Project Layout

```bash
modelpulse/
├── pyproject.toml          # Packaging config
├── README.md               # This doc
├── Images/                 # Screenshots (server.png, bridge.png, etc.)
├── modelpulse/             # Core package
│   ├── __init__.py
│   ├── main.py            # Unified CLI: modelpulse bridge, modelpulse server
│   ├── shared/            # Shared models
│   │   ├── __init__.py
│   │   └── models.py      # ShardManifest, InferenceMetrics
│   ├── server/          # Server side
│   │   ├── __init__.py
│   │   └── server.py      # FastAPI server
│   └── client/          # Client side
│       ├── __init__.py
│       ├── cli.py         # Bridge CLI
│       ├── bridge.py      # RAM GGUF assembly + llama.cpp
│       └── shard_client.py # Async HTTP client
├── tools/                 # Utilities
│   ├── gguf_parser.py
│   └── gguf_to_shards.py  # GGUF → shard converter
```


