Metadata-Version: 2.4
Name: neuralbrok
Version: 0.9.0
Summary: Local-first LLM routing gateway — use model='neuralbroker' and it routes intelligently between local Ollama, discovered subscriptions (Claude Pro, Codex), and paid API fallbacks.
Project-URL: Homepage, https://github.com/khan-sha/neuralbroker
Project-URL: Repository, https://github.com/khan-sha/neuralbroker
Project-URL: Documentation, https://github.com/khan-sha/neuralbroker/tree/main/docs
Project-URL: Bug Tracker, https://github.com/khan-sha/neuralbroker/issues
License: MIT License
        
        Copyright (c) 2026 NeuralBroker contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: claude,llm,local-ai,mcp,neuralbroker,ollama,openai-compatible,routing,subscription,vram
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: click>=8.1.0
Requires-Dist: fastapi>=0.110.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: prometheus-client>=0.20.0
Requires-Dist: psutil>=6.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: uvicorn[standard]>=0.29.0
Provides-Extra: cloud
Requires-Dist: boto3>=1.34.0; extra == 'cloud'
Requires-Dist: google-cloud-aiplatform>=1.50.0; extra == 'cloud'
Provides-Extra: dev
Requires-Dist: httpx>=0.27.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Provides-Extra: gpu
Requires-Dist: nvidia-ml-py>=12.0.0; extra == 'gpu'
Description-Content-Type: text/markdown

<div align="center">

<img src="logo.svg" alt="NeuralBroker" width="80"/>

# NeuralBroker

**The intelligent LLM gateway that makes your $20/mo subscriptions work everywhere.**

[![PyPI version](https://badge.fury.io/py/neuralbrok.svg)](https://badge.fury.io/py/neuralbrok)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

</div>

---

NeuralBroker is a **local-first LLM routing daemon** that sits between your AI tools (Claude Code, Cursor, Codex, Cline) and your models. It exposes a single OpenAI-compatible endpoint and a virtual model called `neuralbroker` — tools talk to that name, and NeuralBroker silently picks the best backend for every request.

**The idea is simple:** Why pay per-token on the API when you already pay $20/month for Claude Pro? NeuralBroker discovers your existing subscriptions, uses them for hard tasks, and sends easy tasks to your local GPU for free.

---

## How it works

```
Your IDE / Tool
       │  model: "neuralbroker"
       ▼
 ┌─────────────────────────────────┐
 │         NeuralBroker            │
 │                                 │
 │  1. Score prompt (15 dims, <1ms)│
 │  2. Classify → SIMPLE/MEDIUM/   │
 │                COMPLEX/REASONING│
 │  3. Pick backend:               │
 │     SIMPLE/MEDIUM → Local Ollama│
 │     COMPLEX/REASONING →         │
 │       ① Discovered subscription │  ← Claude Pro / Codex / ChatGPT
 │       ② Paid API key fallback   │  ← Groq / OpenAI / Anthropic
 └─────────────────────────────────┘
       │
       ▼
 Best model for the job
 (you never choose manually again)
```

### The 3-Tier Cost Strategy

| Task Tier | Example | Backend | Your Cost |
|-----------|---------|---------|-----------|
| `SIMPLE` | "What is the capital of France?" | Local Ollama (llama3.2:1b) | **$0.00** |
| `MEDIUM` | "Write a short cover letter" | Local Ollama (qwen2.5:7b) | **$0.00** |
| `COMPLEX` | "Refactor this 500-line module" | Claude Pro subscription | **$0.00** *(already paying)* |
| `REASONING` | "Prove this math theorem step by step" | Claude Pro subscription | **$0.00** *(already paying)* |
| *Fallback* | No local + no subscription | Groq/OpenAI API | ~$0.002 |

---

## Quick Start

```bash
pip install neuralbrok
neuralbrok setup    # Detect your GPU and generate config
neuralbrok start    # Start the gateway on http://localhost:8000
```

Point any OpenAI-compatible tool to `http://localhost:8000/v1` with `model=neuralbroker` and you're done.

---

## Features

### 🧠 Intelligent Routing (No Config Required)
- **15-dimension prompt scoring** classifies every request in under 1ms — no external LLM needed for routing decisions
- **NeuralFit hardware scoring** picks the best local model for your specific GPU and VRAM capacity
- **Virtual model name** — set `model=neuralbroker` once, never touch it again

### 💸 Subscription Inheritance
- Auto-discovers Claude Code OAuth sessions, Codex auth, and env-based API keys on startup
- Inherited subscriptions are treated as **zero marginal cost** — they're preferred over paid API keys for high-tier tasks
- Works with: Claude Pro/Max, GitHub Copilot (Codex), ChatGPT Plus

### 🖥️ Local-First
- Ollama and llama.cpp supported out of the box
- VRAM-aware: automatically avoids routing to local when VRAM is critically low
- Models are ranked by NeuralFit composite score (quality, speed, context fit, hardware fit)

### 🔌 One-Command IDE Integration
```bash
neuralbrok setup claude-code   # Wires NeuralBroker into Claude Code
neuralbrok setup cursor        # Wires NeuralBroker into Cursor
neuralbrok setup codex         # Wires NeuralBroker into Codex CLI
neuralbrok setup cline         # Wires NeuralBroker into Cline (VS Code)
```
Supports 20+ tools: Claude Code, Cursor, Cline, GitHub Copilot, Gemini CLI, OpenCode, Warp, Codex, Amp, Kimi Code, Firebender, Windsurf, and more.

### 📡 MCP Server
NeuralBroker ships with an MCP server that exposes routing intelligence directly to Claude Code and Cursor:
```bash
neuralbrok mcp   # Start MCP server on stdio
```
Available MCP tools:
- `nb_route_preview` — Preview routing tier for any prompt
- `nb_get_active_auth` — See which subscriptions are currently discovered

---

## Configuration

NeuralBroker auto-detects your hardware and generates a config on first run. The config lives at `~/.neuralbrok/config.yaml`.

```yaml
local_nodes:
  - name: local
    runtime: ollama
    host: localhost:11434

routing:
  default_mode: smart   # smart | cost | speed | fallback

# Optional: Specify which models are allowed for smart mode
# allowed_models:
#   - qwen2.5:7b
#   - llama3.2:3b

# Optional: Cloud fallback models (Ollama pull tags)
# ollama_cloud_models:
#   - claude-sonnet-4-5
```

### Routing Modes

| Mode | Behavior |
|------|----------|
| `smart` | 15-dim scoring decides local vs cloud per-request *(default)* |
| `cost` | Always prefer cheapest backend |
| `speed` | Always prefer lowest-latency backend |
| `fallback` | Try local first; spill to cloud only on failure |

---

## Subscription Discovery

NeuralBroker automatically scans for auth on startup. View what it found:
```bash
curl http://localhost:8000/nb/discovered
```

To disable auto-discovery:
```bash
NB_DISABLE_AUTO_DISCOVERY=1 neuralbrok start
```

---

## API Reference

NeuralBroker is fully OpenAI-compatible.

```bash
# Chat completions — use "neuralbroker" to activate smart routing
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "neuralbroker",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# List available models
curl http://localhost:8000/v1/models

# Check routing stats
curl http://localhost:8000/nb/stats

# Last 500 routing decisions
curl http://localhost:8000/nb/routing-log

# Live hardware info
curl http://localhost:8000/nb/hardware

# Change routing mode at runtime
curl -X POST http://localhost:8000/nb/mode \
  -H "Content-Type: application/json" \
  -d '{"mode": "speed"}'
```

---

## Supported Providers

| Local | Cloud (API Key) | Cloud (Subscription Auto-Discovered) |
|-------|----------------|--------------------------------------|
| Ollama | Groq | Claude Pro / Max (Claude Code) |
| llama.cpp | Together AI | GitHub Copilot (Codex) |
| LM Studio | OpenAI | ChatGPT Plus |
| | Anthropic API | |
| | Gemini | |
| | Mistral | |
| | Perplexity | |
| | DeepSeek | |
| | + 15 more | |

---

## Observability

- **Dashboard:** `http://localhost:8000/dashboard` — Live routing log, VRAM gauge, per-provider stats
- **Prometheus:** `http://localhost:8000/metrics`
- **Grafana:** Pre-built dashboards in `grafana/`

---

## Security Note

> NeuralBroker inherits auth tokens from tools **already installed and authenticated on your machine**. It never sends your credentials to external services — tokens are used directly against their respective provider APIs. You remain in full control.

---

## License

MIT © NeuralBroker contributors
