Metadata-Version: 2.4
Name: omniagent-fleet
Version: 0.2.1
Summary: AI Infrastructure OS — same engine, four facades (CLI, REST, web, MCP). Reduce 80-95% of AI spend by routing tasks to the right model.
Author-email: Sergio Garcia <sgarcia@ubicacuenca.com>
License: MIT
Project-URL: Homepage, https://github.com/landrover1984/omniagent
Project-URL: Source, https://github.com/landrover1984/omniagent
Project-URL: Issues, https://github.com/landrover1984/omniagent/issues
Project-URL: Changelog, https://github.com/landrover1984/omniagent/blob/main/CHANGELOG.md
Keywords: ai,llm,router,cost-reduction,cli,mcp,local-first,ollama,fleet,infrastructure
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Distributed Computing
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastapi>=0.109.0
Requires-Dist: uvicorn[standard]>=0.27.0
Requires-Dist: pydantic>=2.5.0
Requires-Dist: gitpython>=3.1.40
Requires-Dist: psutil>=5.9.8
Requires-Dist: py-cpuinfo>=9.0.0
Requires-Dist: httpx>=0.26.0
Requires-Dist: python-socketio>=5.11.0
Requires-Dist: websockets>=12.0
Requires-Dist: rich>=13.7.0
Requires-Dist: typer>=0.9.0
Requires-Dist: docker>=7.0.0
Requires-Dist: pyyaml>=6.0.1
Requires-Dist: paramiko>=3.4.0
Requires-Dist: boto3>=1.34.0
Requires-Dist: tomli>=2.0.1; python_version < "3.11"
Dynamic: license-file

﻿<div align="center">

# OmniAgent

## You are overspending on AI.

**OmniAgent routes every AI task to the most efficient model automatically.**
Local first. Cloud only when it pays off. **80–97% savings** on your AI bill.

[Try the AI Cost Calculator](landing/index.html) · [See the 60-second demo](#the-60-second-demo) · [Star on GitHub](https://github.com/landrover1984/omniagent)

[![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![Tests](https://img.shields.io/badge/tests-473%20passing-brightgreen.svg)](tests/)
[![Zero Telemetry](https://img.shields.io/badge/zero-telemetry-blueviolet.svg)]()
[![Local First](https://img.shields.io/badge/local-first-blue.svg)]()
[![MIT 100%](https://img.shields.io/badge/100%25-open%20source-orange.svg)]()

</div>

---

## The math most devs don't realize

> *"I just need AI to review my code, write docstrings, and rename things."*
> — every developer with a $500/month Cursor + Claude bill

Here's what you actually need (benchmarked on real hardware, Jun 2026):

| Task | All-Claude reality | OmniAgent | Savings |
|------|-------------------|-----------|---------|
| **Review a function for bugs** | Claude · **$0.30** | qwen2.5-coder:7b (local) · **$0.00** | **100%** |
| **Write a Google-style docstring** | Claude · **$0.28** | qwen2.5-coder:7b (local) · **$0.00** | **100%** |
| **Rename a variable** | Claude · **$0.15** | qwen2.5-coder:7b (local) · **$0.00** | **100%** |
| **Explain TCP vs UDP** | Claude · **$0.10** | qwen2.5-coder:7b (local) · **$0.00** | **100%** |
| **Classify a bug ticket** | Claude · **$0.08** | qwen2.5-coder:7b (local) · **$0.00** | **100%** |

**Fleet benchmark · MSI desktop (GTX 1650, 4GB VRAM, 8 threads) · 5 tasks · 506 tokens: $0.00 total cloud spend.**

OmniAgent uses Claude when Claude is the right tool. It just doesn't use Claude when Qwen can do the same job at 1% the cost.

---

## The 60-second demo

Real benchmark run on MSI desktop (GTX 1650, 4GB VRAM, 8 threads):

```
→ msi-node: qwen2.5-coder:7b | $0.00 | 30,123ms  (review function)
→ msi-node: qwen2.5-coder:7b | $0.00 | 50,475ms  (write docstring)
→ msi-node: qwen2.5-coder:7b | $0.00 | 10,181ms  (rename variable)
→ msi-node: qwen2.5-coder:7b | $0.00 | 15,321ms  (explain TCP/UDP)
→ msi-node: qwen2.5-coder:7b | $0.00 |  8,649ms  (classify ticket)

Total: 5 tasks · 506 tokens · $0.00 cloud spend · avg 22.95s/task
```

**Every task ran on local GPU. Zero cloud cost. That's what "AI Infrastructure OS" means.**

---

## Weekly AI Intelligence — `omniagent post-mortem`

> *"You don't need a budget. You need to see what you spent."*

v0.2.0 adds the killer first-run experience: a persistent cost ledger + a Weekly AI Intelligence report.

Every `omniagent agent-route` and `omniagent fleet route` call is now logged to `~/.omniagent/postmortem/ledger.db`. Then run:

```bash
omniagent post-mortem                  # last 7 days
omniagent post-mortem --period month   # last 30 days
omniagent post-mortem --period all     # all time
omniagent post-mortem --json | jq      # pipe to tools
omniagent post-mortem -o weekly.md     # save the report
omniagent post-mortem --demo           # inject sample data and see the report
```

Sample output (with `--demo`):

```
# 🧠 Weekly AI Intelligence

_Generated 2026-06-03 · Period: last 7 days_

## 💰 Top-line numbers

| Metric | Value |
|---|---:|
| Tasks run | 10 |
| Tokens (in + out) | 23,770 |
| **Total cost** | **$0.3046** |
| ↳ Local | $0.0000 |
| ↳ Cloud | $0.3046 |
| All-Claude-Sonnet equivalent | $0.1777 |
| **Savings vs all-Claude** | **-71.4%** ⚠️ |

> 💡 You spent $0.3046 on cloud models.
> $0.2894 of that (≈95%) probably could have been local.

## ⚡ Top optimization opportunities

**Total potential savings: $0.0991**

### 1. claude-sonnet-4 → qwen2.5-coder:7b
- Calls: 6 · Tokens: 12,570
- Actual cost: $0.0936 · Could have been: $0.0000
- Savings: **$0.0936** · Risk: low

### 2. gpt-4o → qwen2.5-coder:7b
- Calls: 1 · Tokens: 1,000
- Actual cost: $0.0055 · Could have been: $0.0000
- Savings: **$0.0055** · Risk: low

### 3. claude-opus-4-5 → no alt
- Calls: 1 · Tokens: 5,300
- Actual cost: $0.2055 · Could have been: $0.2055
- Savings: $0.0000 · Risk: high (frontier reasoning)
```

**The "risk" field is honest.** Frontier models (Opus, o1) get `risk=high` and `local_alternative=null` — you really did need that model. Trivial and simple tasks get `risk=low` and a concrete local alternative. **No misleading savings claims.**

Same data is available from the web: `GET /api/postmortem?period=week`.

---

## Agent Generator — `omniagent generate`

> *"You don't need to write agents. You need your project to write them."*

v0.2.1 adds the **Agent Generator**: point it at any project, and in under a second you get three personalized YAML agents (`reviewer`, `tester`, `doc`) tuned to that project's exact stack — language, frameworks, test framework, lint tools, docstring style, line length.

```bash
omniagent generate --from .                          # scan current dir, write to ~/.omniagent/agents/
omniagent generate --from . --only reviewer          # just the code reviewer
omniagent generate --from . --output .omniagent/agents/   # repo-local agents
omniagent generate --from . --dry-run --json         # see the profile + agent previews
omniagent generate --from . --force                  # overwrite existing
```

Sample analysis:

```
+-------------------- Analyzed ---------------------+
| Project scan                                      |
| Path: C:\Users\MSI\repos\myapp                     |
| Primary language: python                          |
| Frameworks: FastAPI, Pydantic                     |
| Test framework: pytest                            |
| Lint tools: ruff                                  |
| Patterns: packaged, typed                         |
| Files: 47 | LOC: 8,392                            |
+---------------------------------------------------+
```

Sample generated `reviewer.yaml`:

```yaml
name: reviewer
version: "1.0.0"
category: review
model:
  primary: qwen2.5-coder:7b        # local, free
  fallback_chain:
    - deepseek-coder
    - claude-haiku-4
    - gpt-4o-mini
system_prompt: |
  You are a senior code reviewer for this project. Be specific. Cite file:line.
  ...
  Project context:
  - Primary language: python
  - Frameworks: FastAPI, Pydantic
  - Lint tools: ruff
  - Docstring style: google
  - Max line length: 100
  ...
cost_ceiling_usd: 0.05
max_tokens: 1500
temperature: 0.1
```

**The generator does not call any LLM.** The output is a deterministic template parameterized by the `ProjectProfile`. Sub-second, offline, free — both for the analysis and for using the resulting agents (their primary model is local `qwen2.5-coder:7b`).

Use cases:
- **First-run onboarding** — a new project has agents without writing a single YAML by hand
- **CI/CD** — `omniagent generate --only reviewer --force` on every commit, agent stays in sync with the project
- **Multi-repo fleet** — scan each repo's profile, install per-repo agents, swap based on stack
- **PR review** — the generated reviewer knows the project's exact style and lint config

Same data is available from the web: `GET /api/generate/profile?project_dir=PATH` and `POST /api/generate`.

---

## Why OmniAgent exists

The AI industry is in an efficiency crisis:

- **73% of prompts** sent to frontier models could be handled by smaller local models
- Developers burn **$500–$1000/month** on Cursor + Claude + GPT with **no visibility** into what each line costs
- Agents **hallucinate APIs, break production code, leak secrets, forget to commit** — and you find out at 2 AM
- Massive energy waste: a single city could run on the daily inference cycles of one frontier API call
- Lock-in: one IDE, one provider, one pricing tier
- No coordination between local hardware, cloud APIs, VPS nodes, and the billions of idle GPUs sitting in garages and offices worldwide

**The models will keep changing. The hardware will keep evolving.**
**The only permanent problem is: how do you orchestrate all this intelligence efficiently, securely, and cheaply?**

That's what OmniAgent solves.

---

## What it is (and what it isn't)

OmniAgent is **not** a model. **Not** an agent. **Not** a chatbot.

OmniAgent is the **operating system that coordinates the entire AI ecosystem** — models, agents, hardware, costs, and security — so you stop wasting compute, money, and trust.

Think of it as:

- **Linux** doesn't create every app, but everything runs on it.
- **Kubernetes** doesn't build every container, but it orchestrates them all.
- **Steam** doesn't develop every game, but it hosts them.

**OmniAgent** doesn't compete with OpenAI, Anthropic, DeepSeek, or your favorite open-source model. **It makes all of them work together intelligently.**

---

## The 4 façades: one engine, four ways to use it

| Façade | Audience | What you get |
|--------|----------|--------------|
| **CLI** (`omniagent route "task"`) | Developers, power users | Full control, scriptable, fits in any pipeline |
| **Web app** (`omniagent web`) | Everyone, especially non-devs | 5-tab dashboard on `http://localhost:8765` — visualize routing, hardware, optimize |
| **YAML agents** (`*.yaml` in `~/.omniagent/agents/`) | Agent authors, teams | Declarative, shareable, version-controlled — see [docs/agents.md](docs/agents.md) |
| **MCP tools** (via any MCP client) | Tool integrators | 6 tools: route, classify, decide, audit, deploy, optimize |

Same Python engine. Four ways to use it. You pick the one that fits your workflow.

---

## The 90/9/1 design

- **90% of users** never touch the CLI. They open `http://localhost:8765`, type a task, see the routing, hit **Run it ▶**.
- **9% of users** open the **Optimize** tab, see what they're overspending on, and one-click install a cheaper agent.
- **1% of users** write their own YAML agents, publish them, share them.

The dashboard is the product. The YAML is the protocol. The CLI is the power tool.

---

## How it works (under the hood)

1. **Task arrives** — text in the CLI, the web, or via MCP
2. **TaskClassifier** — 10 categories, 5 complexities, detects vision / function-calling
3. **AgentRegistry** — finds the right agent (project > user > builtin, YAML-defined)
4. **SmartRouter** — picks the right model given the agent's constraints + your budget
5. **AdaptiveRouter** — combines all of the above into a single `RoutingDecision`
6. **LLM call** — local first, cloud only if budget + quality demand it
7. **CostTracker** — logs the spend, feeds back into the next routing decision
8. **Guardian++** — pre / during / post audit on every action (secret scan, command sandbox, commit verification)

**454 unit tests** + **19 integration tests** validate every step.

---

## Quickstart (60 seconds)

```bash
git clone https://github.com/landrover1984/omniagent
cd omniagent
pip install -e .
omniagent web
# open http://localhost:8765
```

Or use the CLI directly:
```bash
omniagent agent-route "review this code for security" --budget 0.10
omniagent agent-list                     # see all available agents
omniagent agent-install ./my-agent.yaml  # add your own
omniagent optimize                       # find cheaper routes
omniagent post-mortem                    # weekly AI intelligence
omniagent generate --from .              # code -> YAML agents
omniagent agent-decide "design a cache"  # see the routing (no LLM call)
```

**Zero API keys needed to start.** Local models via Ollama work out of the box.

---

## What ships today

| Layer | Status | Tests |
|-------|--------|-------|
| Agent Protocol (YAML agents) | Shipped | 18 |
| Task Classifier (10 categories) | Shipped | 20 |
| AdaptiveRouter (the brain) | Shipped | 8 |
| 5-tab Web UI + Post-Mortem API | Shipped | 16 endpoints |
| Cost Optimizer (the killer feature) | Shipped | 3 |
| **Post-Mortem (Weekly AI Intelligence)** | **Shipped v0.2.0** | **47** |
| **Agent Generator (code → YAML agents)** | **Shipped v0.2.1** | **52** |
| Anti-Hallucination Audit (Guardian++) | Shipped | 23 |
| Hybrid Deploy (local / VPS / AWS) | Shipped | 28 |
| MCP Server (6 tools) | Shipped | 18 |
| Private Fleet (multi-node) | Shipped v0.1.4 | 10 |
| **CLI commands** | **26+** | **75+** |
| **Total** | | **473 passing, 2 skipped** |

---

## Roadmap

| Phase | Theme | Status |
|-------|-------|--------|
| **v0.1.x** | **AI Infrastructure OS** — routing, cost, optimize, local-first | **Shipped** |
| **v0.2.0** | **Weekly AI Intelligence** — persistent cost ledger, post-mortem reports, savings opportunities | **Shipped** |
| **v0.2.1** | **Agent Generator** — code → custom YAML agents, deterministic, free | **Shipped** |
| v0.2.2 | AI Firewall — privacy, PII detection, compliance mode | Next |
| v0.3.x | Visual Dashboard — real-time cost graphs, agent analytics, team view | Planned |
| v0.5.x | Distributed Compute — idle GPU federation, opt-in mesh | Deferred |
| v0.6.x | Marketplace + Incentives — community YAMLs, reputation, rewards | Deferred |

We are **not** building another "AI wrapper". We are building the **coordination layer** that the entire AI ecosystem needs.

Distributed compute and marketplace are real, but they're not the wedge. The wedge is: **stop overspending on AI**. Get that right first.

---

## License

**MIT — 100% open source, forever.** No paid tier, no "enterprise edition", no bait-and-switch.

---

<div align="center">

**The models will change. The hardware will change. The coordination layer is permanent.**

[Star on GitHub](https://github.com/landrover1984/omniagent) · [Try the Cost Calculator](landing/index.html) · [Write your first agent](docs/agents.md)

</div>
