Metadata-Version: 2.4
Name: nvhive
Version: 0.30.1
Summary: NVHive — Multi-LLM orchestration platform with intelligent routing, hive consensus, and auto-agent generation
Author: NVHive Contributors
License-Expression: MIT
Project-URL: Homepage, https://github.com/thatcooperguy/nvHive
Project-URL: Repository, https://github.com/thatcooperguy/nvHive
Project-URL: Issues, https://github.com/thatcooperguy/nvHive/issues
Keywords: llm,ai,nvidia,gpu,orchestration,multi-model,agents,ollama,nemotron
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Environment :: GPU :: NVIDIA CUDA
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer[all]>=0.12
Requires-Dist: litellm>=1.55
Requires-Dist: pydantic>=2.0
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: sqlalchemy[asyncio]>=2.0
Requires-Dist: aiosqlite>=0.20
Requires-Dist: keyring>=25.0
Requires-Dist: rich>=13.0
Requires-Dist: httpx>=0.27
Requires-Dist: pyyaml>=6.0
Requires-Dist: tiktoken>=0.7
Requires-Dist: anyio>=4.0
Requires-Dist: zstandard>=0.20
Provides-Extra: serve
Requires-Dist: fastapi>=0.100; extra == "serve"
Requires-Dist: uvicorn[standard]>=0.20; extra == "serve"
Requires-Dist: passlib[bcrypt]>=1.7; extra == "serve"
Provides-Extra: nvidia
Requires-Dist: nvidia-ml-py3>=7.352.0; extra == "nvidia"
Provides-Extra: mcp
Requires-Dist: mcp[cli]>=1.0; extra == "mcp"
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Requires-Dist: pytest-timeout>=2.3; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: fastapi>=0.100; extra == "dev"
Requires-Dist: uvicorn[standard]>=0.20; extra == "dev"
Requires-Dist: passlib[bcrypt]>=1.7; extra == "dev"
Requires-Dist: nvidia-ml-py3>=7.352.0; extra == "dev"
Provides-Extra: vision
Requires-Dist: pyautogui>=0.9.54; extra == "vision"
Requires-Dist: python-xlib>=0.33; sys_platform == "linux" and extra == "vision"
Requires-Dist: Pillow>=10.0; extra == "vision"
Provides-Extra: browser
Requires-Dist: playwright>=1.40; extra == "browser"
Provides-Extra: all
Requires-Dist: fastapi>=0.100; extra == "all"
Requires-Dist: uvicorn[standard]>=0.20; extra == "all"
Requires-Dist: passlib[bcrypt]>=1.7; extra == "all"
Requires-Dist: nvidia-ml-py3>=7.352.0; extra == "all"
Requires-Dist: pyautogui>=0.9.54; extra == "all"
Requires-Dist: python-xlib>=0.33; sys_platform == "linux" and extra == "all"
Requires-Dist: Pillow>=10.0; extra == "all"
Requires-Dist: playwright>=1.40; extra == "all"
Dynamic: license-file

# nvHive

**One command. Every AI model you have. Automatically assembled into the best team for each task.**

![version](https://img.shields.io/badge/version-0.30.0-blue) ![python](https://img.shields.io/badge/python-3.11%2B-blue) ![license](https://img.shields.io/badge/license-MIT-green) ![ci](https://img.shields.io/badge/CI-Linux%20%7C%20Windows%20%7C%20macOS-blue)

```bash
nvh "What is a binary search tree?"              # → answers (single best advisor)
nvh "Fix the timeout bug in council.py"          # → auto-detects coding task → agent mode
nvh "Should we use Redis or Postgres?"           # → auto-detects debate → council (3+ advisors)
nvh "take a screenshot and describe my desktop"  # → desktop agent (vision + tools)
nvh "setup comfyui"                              # → agent installs, configures, launches
```

<p align="center">
  <img src="docs/screenshots/terminal-demo-v2.gif" alt="nvHive CLI" width="640">
</p>

---

## Get Started

```bash
pip install nvhive
nvh                    # first-run setup auto-detects GPU, installs local AI, configures providers
nvh "your question"    # just ask — nvhive figures out the rest
```

```bash
# Optional extras
pip install "nvhive[vision]"      # desktop agent: screenshot, click, type, scroll
pip install "nvhive[browser]"     # headless browser automation (playwright)
pip install "nvhive[all]"         # everything
```

On first run, `nvh` launches a guided 3-step setup — GPU detection, provider keys, local model pulls. Works immediately with local models (no signup needed). Every step is skippable. Run `nvh setup` anytime to reconfigure.

<p align="center">
  <img src="docs/screenshots/setup-flow.svg" alt="nvHive 3-Step Setup Flow" width="900">
</p>

**GPU tier → model recommendations:**

| VRAM | Text Model | Vision Model | Behavior |
|------|-----------|-------------|----------|
| 0 GB (no GPU) | Cloud only | Cloud fallback | Free tiers first (Groq, LLM7, GitHub) |
| 4-8 GB | `nemotron-mini` | `moondream` | Basic local + desktop agent |
| 12-16 GB | `qwen2.5-coder:7b` | `minicpm-v` | Coding + vision local |
| 24 GB | `gemma2:27b` | `llama3.2-vision` | Strong text + best vision |
| 48 GB | `llama3.3:70b` | `llama3.2-vision` | Full power local |
| 96+ GB | Multiple 70B models | `llama3.2-vision` | Full local council, $0 |

Setup auto-detects your VRAM and recommends models that fit concurrently. No root/sudo needed — Ollama installs to `~/.nvh/`. [Full GPU guide](docs/GPU_DETECTION.md)

---

## Why nvHive

**Council scored 68% higher than a single model — at $0 cost.** Three free providers running in parallel outperformed a single model on accuracy, completeness, and coherence. [Benchmark details below.](#benchmark-results)

- **Smart team assembly.** nvHive generates expert agents for your question and matches each to the best LLM for their specialty — a "Security Engineer" agent routes to a security-strong provider, a "Database Expert" to one suited for database queries.
- **Automatic orchestration.** Coding tasks get a planner + coder + reviewer. Complex questions get a council. Simple questions get the fastest advisor. All automatic.
- **Scales with what you have.** 1 provider → single-model answers. 3+ providers → council on complex questions. Local GPU → free inference alongside cloud. DGX Spark → three 70B models in parallel, fully local.
- **4-layer safety guardrails.** Command blocklist, filesystem boundary enforcement, secrets redaction, and resource limits.

<p align="center">
  <img src="docs/screenshots/smart-router.svg" alt="nvHive Smart Router" width="900">
</p>

---

## Architecture

<p align="center">
  <img src="docs/screenshots/architecture.svg" alt="nvHive Full Stack Architecture" width="900">
</p>

9 layers from `pip install` to GPU inference — install, setup, 4 user interfaces, intent detection, 5 execution modes, smart routing, tool registry, 23+ AI providers, and the hardware stack. Local-first with cloud fallback. [Architecture docs](docs/ARCHITECTURE.md)

---

## Features

### Desktop Agent

AI that sees your screen, controls mouse/keyboard, installs software, and navigates browsers — powered by local vision models.

```bash
nvh "take a screenshot and describe my desktop"
nvh "setup comfyui"                    # agent: git clone → pip install → launch → verify
nvh "open firefox and go to github.com"
```

Vision pipeline: screenshot → local vision model (llama3.2-vision / minicpm-v) → coordinate estimation → action → verify. Falls back to cloud vision if no local model. Works on Linux (X11), macOS, and Windows. [Desktop agent docs](docs/LINUX_DESKTOP.md)

### Agentic Coding

Multi-model coding agent with dynamic expert referral, iterative QA, parallel execution, and vision/browser tools.

```bash
nvh agent "Fix the streaming timeout bug in council.py"
nvh agent "Add unit tests for auth" --dir ./myproject
nvh agent "Build the notification service" --sandbox     # Docker-isolated
nvh review                     # multi-model code review
nvh test-gen nvh/core/council.py     # AI test generation
```

Key capabilities: dynamic expert referral, iterative QA refinement, parallel pipeline, Docker sandbox, execution checkpoints with rollback, LLM drift detection, multi-repo workspaces, and VS Code extension. Scales from no-GPU (fully cloud) to DGX Spark (3 local 70B models). [Agentic coding docs](docs/TOOLS.md)

### Council Mode

Run the same query through multiple providers in parallel, then synthesize. Expert personas generated per query, each assigned to a different model. Responses analyzed for agreement, synthesized by a non-member provider with a confidence score.

```bash
nvh convene "Should we use Redis or Postgres for sessions?"   # 3 models → synthesis
nvh throwdown "Review this architecture for scalability"      # 3-pass deep analysis with critique
```

Different models have different blind spots — council surfaces all perspectives. Council with 3 free providers costs $0. [Council docs](docs/COUNCIL.md)

### Smart Routing

Each request is scored across capability (40%), cost (30%), latency (20%), and health (10%), then routed to the highest-scoring provider. Routing improves over time — after 20 queries per provider, it's fully data-driven.

```bash
nvh ask --escalate "Design a distributed lock manager"    # try free first, upgrade if uncertain
nvh ask --verify "Is eval() safe in Python?"              # cross-model verification
nvh routing-stats    # see learned vs static scores
nvh health           # provider resilience dashboard
```

Local-first with NVIDIA GPUs: simple queries route to your GPU via Ollama — no cloud, no cost, no data leaving your machine. `--prefer-nvidia` gives a 1.3x routing bonus to NVIDIA hardware. [Routing docs](docs/ARCHITECTURE.md)

---

## Providers

**23 providers. 63 models. 25 free — no credit card required.**

| Tier | Providers | Rate Limits |
|------|-----------|-------------|
| **Free (no signup)** | Ollama (local), LLM7 | Unlimited / 30 RPM |
| **Free (email signup)** | Groq, GitHub Models, Cerebras, SambaNova, Cohere, AI21, SiliconFlow, HuggingFace | 15-30 RPM |
| **Free (account)** | Google Gemini, Mistral, NVIDIA NIM | 15-1000 RPM |
| **Paid** | OpenAI, Anthropic, DeepSeek, Fireworks, Together, OpenRouter, Grok | Pay per token |

[Full provider guide](docs/PROVIDERS.md)

---

## Integrations

nvHive exposes a CLI (`nvh`), web dashboard (`nvh webui`), Python SDK (`import nvh`), MCP server for Claude Code, and OpenAI/Anthropic-compatible API proxies.

```python
import nvh

response = await nvh.complete([{"role": "user", "content": "Explain quicksort"}])
result = await nvh.convene("Architecture review", cabinet="engineering")
```

| Integration | Setup |
|-------------|-------|
| Anthropic SDK | `ANTHROPIC_BASE_URL=http://localhost:8000/v1/anthropic` |
| OpenAI SDK | `OPENAI_BASE_URL=http://localhost:8000/v1/proxy` |
| Claude Code | `claude mcp add nvhive -- python -m nvh.mcp_server` |
| NemoClaw | `nvh nemoclaw --start` — [NemoClaw docs](docs/NEMOCLAW.md) |

[SDK & API reference](docs/SDK_API.md) | [Claude Code integration](docs/CLAUDE_CODE_INTEGRATION.md) | [OpenClaw migration](docs/OPENCLAW_MIGRATION.md)

---

## Benchmark Results

Real data from NVIDIA DGX Spark (GB10, 120GB). 16 prompts across code generation, debugging, reasoning, math, creative writing, and Q&A. Judged by OpenAI with ground truth verification.

| Mode | Accuracy | Completeness | Coherence | **Overall** | Cost |
|------|----------|-------------|-----------|---------|------|
| Single Model (Nemotron Super) | 5.5 | 5.7 | 5.0 | **5.1** | $0.00 |
| **Council (Ollama + Groq + Google)** | **9.0** | **8.0** | **9.0** | **8.6** | **$0.00** |

```bash
nvh bench              # GPU speed (tokens/sec)
nvh bench -q           # speed + quality comparison
nvh health             # provider resilience
```

Results vary by hardware and workload — run `nvh bench` to measure on your setup.

---

## Core Commands

| Command | What It Does |
|---------|-------------|
| `nvh "question"` | Smart route to best available model |
| `nvh convene "question"` | Council consensus (3+ models) |
| `nvh throwdown "question"` | Three-pass deep analysis with critique |
| `nvh agent "task"` | Agentic coding with expert referral + QA |
| `nvh review` | Multi-model code review |
| `nvh test-gen file.py` | AI test generation with verification |
| `nvh safe "question"` | Local only — nothing leaves your machine |
| `nvh serve` | Start API server (OpenAI + Anthropic proxy) |
| `nvh webui` | Launch web dashboard |
| `nvh health` | Provider resilience dashboard |
| `nvh bench` | GPU speed test (tokens/sec) |
| `nvh setup` | Interactive provider setup |
| `nvh doctor` | Full diagnostic dump |

[Full command reference](docs/COMMANDS.md) (50+ commands)

---

## Documentation

| Guide | Description |
|-------|-------------|
| [Getting Started](docs/GETTING_STARTED.md) | First-time setup |
| [Commands](docs/COMMANDS.md) | Full CLI reference (50+ commands) |
| [Providers](docs/PROVIDERS.md) | 23 providers, rate limits, free tiers |
| [Council System](docs/COUNCIL.md) | Multi-LLM consensus with confidence scoring |
| [Architecture](docs/ARCHITECTURE.md) | System design and adaptive routing |
| [GPU Detection](docs/GPU_DETECTION.md) | Auto-detection, model selection, OOM protection |
| [SDK & API](docs/SDK_API.md) | Python SDK, REST API, proxies |
| [Agent Tools](docs/TOOLS.md) | Agent tools and capabilities |
| [Configuration](docs/CONFIGURATION.md) | Configuration reference |
| [Web UI](docs/WEBUI.md) | Web dashboard |
| [Deploy Without Root](docs/DEPLOY_NO_ROOT.md) | No-root install on servers |
| [Windows Troubleshooting](docs/TROUBLESHOOTING_WINDOWS.md) | Encoding, segfaults, port issues |
| [Releasing](docs/RELEASING.md) | Release runbook |

---

## Important Notes

- **Data Privacy**: Cloud providers transmit queries to third-party APIs subject to each provider's privacy policy. Use `nvh safe` or `--prefer-nvidia` to keep inference local.
- **AI Accuracy**: AI-generated outputs may contain errors. Review agent-modified files before committing to production.
- **Security**: Safety guardrails use pattern-matching heuristics. For sensitive environments, use `--sandbox` with Docker isolation.
- **Benchmarks**: Results measured on NVIDIA DGX Spark reference hardware. Results vary by hardware, provider, and workload.

## License

MIT License. See [LICENSE](LICENSE) for details.
