Metadata-Version: 2.4
Name: multilobe
Version: 0.1.0
Summary: Modular expertise framework for transformer models — add independent LoRA-based expert lobes with automatic hybrid routing.
Author: MultiLobe Contributors
License: MIT
Project-URL: Homepage, https://github.com/Uunan/multilobe
Project-URL: Documentation, https://github.com/Uunan/multilobe#readme
Project-URL: Repository, https://github.com/Uunan/multilobe
Project-URL: Issues, https://github.com/Uunan/multilobe/issues
Keywords: transformers,lora,peft,mixture-of-experts,routing,catastrophic-forgetting,modular-ai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.35.0
Requires-Dist: peft>=0.7.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: datasets>=2.14.0
Requires-Dist: accelerate>=0.24.0
Requires-Dist: tqdm>=4.65.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0; extra == "dev"
Dynamic: license-file

# 🧠 MultiLobe

**Modular expertise framework for transformer models.**

MultiLobe lets you attach independent LoRA-based *expertise lobes* to any frozen HuggingFace causal language model. Each lobe is a lightweight specialist trained on domain-specific data, and a hybrid two-stage router automatically picks the right expert at inference time.

> **Why?** Traditional fine-tuning suffers from *catastrophic forgetting* — teaching a model law makes it forget medicine. MultiLobe solves this architecturally: the base model is always frozen, and each domain lives in its own isolated LoRA adapter.

---

## ✨ Features

- 🔒 **Frozen base model** — the original weights are never modified
- 🧩 **Isolated lobes** — each expertise area is a separate LoRA adapter; training one never affects another
- 🚀 **Hybrid routing** — fast embedding-based first pass + log-probability fallback
- 💾 **Save & load** — persist the full multi-expert setup and restore it later
- 🎯 **Manual override** — force a specific lobe when you know which expert to use
- 📊 **Transparent decisions** — get routing metadata (which lobe, confidence, scores)

---

## 📦 Installation

```bash
pip install multilobe
```

Or install from source:

```bash
git clone https://github.com/multilobe/multilobe.git
cd multilobe
pip install -e .
```

### Requirements

- Python ≥ 3.9
- PyTorch ≥ 2.0
- A CUDA GPU is strongly recommended for training

---

## 🚀 Quick Start

```python
from multilobe import MultiLobeModel

# 1. Load a base model (frozen automatically)
model = MultiLobeModel.from_pretrained("google/gemma-2b")

# 2. Add expertise lobes
model.add_lobe(
    name="medical",
    dataset="path/to/medical.jsonl",
    epochs=3,
    lora_r=16,
    lora_alpha=32,
)

model.add_lobe(
    name="legal",
    dataset="path/to/legal.jsonl",
    epochs=3,
)

# 3. Save the full setup
model.save("./my_multilobe_model")

# 4. Load it back later
model = MultiLobeModel.load("./my_multilobe_model")

# 5. Generate — routing is automatic
output = model.generate("Bu semptomlar ne anlama gelir?")
print(output)
```

---

## 🎯 API Reference

### `MultiLobeModel.from_pretrained(model_name, **kwargs)`

Downloads a HuggingFace causal-LM, freezes all parameters, and initialises the routing system.

| Parameter | Default | Description |
|---|---|---|
| `model_name` | *(required)* | HuggingFace model ID (e.g. `"google/gemma-2b"`) |
| `device` | auto | `torch.device` override |
| `embedding_model` | `"all-MiniLM-L6-v2"` | Sentence-transformer for embedding router |
| `embedding_threshold` | `0.75` | Cosine similarity threshold |
| `logprob_max_tokens` | `30` | Tokens evaluated by log-prob fallback |

---

### `model.add_lobe(name, dataset, epochs, **kwargs)`

Creates, trains, and registers a new LoRA-based expertise lobe.

| Parameter | Default | Description |
|---|---|---|
| `name` | *(required)* | Unique lobe identifier |
| `dataset` | *(required)* | Path to JSONL file (`{"input": ..., "output": ...}`) |
| `epochs` | `3` | Training epochs |
| `lora_r` | `16` | LoRA rank |
| `lora_alpha` | `32` | LoRA alpha scaling |
| `batch_size` | `4` | Training batch size |
| `learning_rate` | `2e-4` | AdamW peak learning rate |

---

### `model.generate(input_text, **kwargs)`

Generate a response with automatic or manual lobe selection.

```python
# Automatic routing
output = model.generate("What are the symptoms of diabetes?")

# Manual lobe selection
output = model.generate("...", lobe="medical")

# With routing metadata
output, meta = model.generate("...", return_metadata=True)
print(meta["selected_lobe"])      # "medical"
print(meta["router_confidence"])  # 0.89
print(meta["router_type"])        # "embedding" or "logprob"
```

---

### `model.save(path)` / `MultiLobeModel.load(path)`

Persists the full multi-lobe setup. Only LoRA weights + metadata are saved — the base model is re-downloaded from the Hub on load.

---

## 🔄 How Routing Works

MultiLobe uses a **two-stage hybrid router**:

```
Input Query
    │
    ▼
┌──────────────────────┐
│  Stage 1: Embedding  │  Fast cosine similarity
│  (all-MiniLM-L6-v2)  │  against lobe centroids
└──────────┬───────────┘
           │
    confidence ≥ 0.75?
    ┌──────┴──────┐
    │ YES         │ NO
    ▼             ▼
 Use lobe    ┌──────────────────┐
             │  Stage 2: LogProb │  Evaluate input
             │  (fallback)       │  under each adapter
             └────────┬─────────┘
                      │
                      ▼
                  Use best lobe
```

1. **Embedding Router** — encodes the query with a sentence-transformer, computes cosine similarity with each lobe's representation vector (the mean embedding of its training data). If the best score exceeds 0.75, route immediately.

2. **Log-Prob Router** — if the embedding router isn't confident, each lobe's adapter is activated in turn and the mean log-probability of the input tokens is computed. The lobe with the highest log-prob wins.

---

## 📁 Dataset Format

Each JSONL file should contain one JSON object per line:

```json
{"input": "What are the symptoms of hypertension?", "output": "Hypertension symptoms include headaches, shortness of breath, and nosebleeds..."}
{"input": "How is diabetes diagnosed?", "output": "Diabetes is typically diagnosed through blood tests such as HbA1c..."}
```

---

## 🏗️ Architecture

```
multilobe/
├── __init__.py          # Public API exports
├── model.py             # MultiLobeModel — main orchestrator
├── lobe.py              # Lobe & LobeMetadata classes
├── trainer.py           # LoRA training loop
├── utils.py             # Dataset loading, tokenisation helpers
└── router/
    ├── __init__.py
    ├── base.py          # BaseRouter ABC & RoutingResult
    ├── embedding.py     # Stage 1 — cosine similarity router
    └── logprob.py       # Stage 2 — log-probability fallback
```

---

## 📜 License

MIT

---

## 🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request.
