Metadata-Version: 2.4
Name: bare-agent
Version: 0.0.1
Summary: A framework-free agent runtime you can read, run, and leave. Own the loop, not the framework. Runs local on Ollama at $0 — or any frontier model.
Project-URL: Homepage, https://github.com/subratamondal1/bare-agent
Project-URL: Repository, https://github.com/subratamondal1/bare-agent
Author: Subrata Mondal
License: MIT
License-File: LICENSE
Keywords: agents,framework-free,litellm,llm,local-first,ollama,tool-calling
Requires-Python: >=3.12
Requires-Dist: litellm>=1.55
Requires-Dist: orjson>=3.10
Requires-Dist: pydantic-settings>=2.7
Requires-Dist: pydantic>=2.10
Requires-Dist: python-dotenv>=1.0
Requires-Dist: structlog>=24.4
Provides-Extra: api
Requires-Dist: fastapi>=0.115; extra == 'api'
Requires-Dist: httpx>=0.28; extra == 'api'
Requires-Dist: redis>=5; extra == 'api'
Requires-Dist: uvicorn[standard]>=0.32; extra == 'api'
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/subratamondal1/bare-agent/main/docs/assets/logo.png" width="96" alt="Bare Agent" />
</p>

<h1 align="center">Bare Agent</h1>

<p align="center">
  <strong>Own the loop, not the framework.</strong>
</p>

<p align="center">
  A framework-free agent runtime you can read, run, and leave — a small library you<br/>
  import and call, plus a visual studio that ejects to plain Python with zero dependency on us.
</p>

<p align="center">
  <a href="LICENSE"><img src="https://img.shields.io/badge/license-MIT-blue?style=flat" alt="License: MIT"></a>
  <img src="https://img.shields.io/badge/python-3.12%2B-blue?style=flat" alt="Python 3.12+">
  <img src="https://img.shields.io/badge/tests-29%20passing-brightgreen?style=flat" alt="Tests: 29 passing">
  <img src="https://img.shields.io/badge/local--first-Ollama-orange?style=flat" alt="Local-first">
  <img src="https://img.shields.io/badge/studio-Next.js%2016-black?style=flat" alt="Studio: Next.js 16">
</p>

<p align="center">
  <a href="https://github.com/subratamondal1/bare-agent/actions/workflows/ci.yml"><img src="https://github.com/subratamondal1/bare-agent/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://github.com/subratamondal1/bare-agent/stargazers"><img src="https://img.shields.io/github/stars/subratamondal1/bare-agent?style=flat&color=yellow" alt="Stars"></a>
  <a href="https://github.com/subratamondal1/bare-agent/commits/main"><img src="https://img.shields.io/github/last-commit/subratamondal1/bare-agent?style=flat" alt="Last commit"></a>
</p>

<p align="center">
  <a href="#features">Features</a> •
  <a href="#quickstart">Quickstart</a> •
  <a href="#the-studio">Studio</a> •
  <a href="#how-it-works">How it works</a> •
  <a href="#eject">Eject</a> •
  <a href="#configuration">Configuration</a> •
  <a href="#development">Development</a>
</p>

---

Most agent frameworks own your `main()`, hide control flow behind metaclasses and DAG executors,
and obscure the actual prompts. `bare-agent` is the opposite: a small library — the agent loop, a
tool registry, a 3-axis budget, and a LiteLLM gateway, ~600 readable lines — that you **import and
call**. You own the loop. Every prompt is in plain sight. You can always **eject to plain Python**
and run it with **zero `bare_agent` dependency**.

On top of the library sits an optional **visual studio**: wire agents into a chain on a canvas,
attach tools, **Run** and watch tokens stream live, then eject the whole flow to a self-contained
`agent.py`. **Local-first** — it runs at zero cost on Ollama; OpenAI, Anthropic, and Gemini are
optional drop-ins through the same loop. Built on Python 3.12 · LiteLLM · FastAPI · Next.js 16 —
with **no agent framework** (no LangChain/LangGraph): the loop, the budget, and the failure
handling are owned directly.

<p align="center">
  <img src="https://raw.githubusercontent.com/subratamondal1/bare-agent/main/docs/assets/bare-agent-demo.gif" width="100%" alt="Bare Agent studio: chain a Solver and an Explainer agent on a canvas, attach the calculator, Run and watch each agent's turns, tool calls, and tokens stream live with real per-call cost, then Eject the whole flow to a self-contained Python script." />
</p>

<p align="center">
  <em>The studio, end to end: chain a <strong>Solver</strong> and an <strong>Explainer</strong>, attach the calculator, <strong>Run</strong> and watch each agent stream its turns, tool calls, and tokens live — with real per-call cost attribution (here on <code>gpt-5.4-mini</code>, ~$0.0006 for the whole chain) — then <strong>Eject to Python</strong>, a self-contained <code>agent.py</code> with zero <code>bare_agent</code> dependency. The same loop runs local-first on Ollama at $0.</em>
</p>

## Features

| Capability | Detail |
|---|---|
| **Framework-free agent loop** | A hand-written tool-use loop over LiteLLM with a 3-axis budget (turns / tokens / wall-clock) + hard cost cap, a retry/fallback ladder, and a self-registering, permission-gated tool registry. The loop is a stateless reducer over an explicit `messages: list[dict]`. |
| **Local-first, $0 — or BYO frontier key** | Every call goes through LiteLLM, so the model id picks the provider. `ollama_chat/qwen3` runs free and offline; `anthropic/…`, `openai/…`, `gemini/…` are drop-ins. No lock-in. |
| **Multi-agent chains** | Wire agents agent→agent; the runtime topologically orders them and feeds each answer into the next. Inline runs, queued runs, and ejected code all execute the same chain. |
| **Visual studio** | A React Flow canvas (Next.js 16 / React 19) to build chains, attach tools, and watch turns / tool calls / tokens stream live over SSE — one readable section per agent. |
| **Eject to plain Python** | Compile any graph to a standalone `agent.py` (litellm + pydantic only) — tool sources inlined, **zero `bare_agent` import**. Machine-checked to compile. The graph is a convenience, never a cage. |
| **HITL / permissions** | An `Approver` gates tool calls allow / ask / deny; successful tool output is wrapped `<untrusted_tool_output>` for prompt-injection containment. |
| **Horizontal scale** | An optional Redis-list job queue + worker pool; Kubernetes + **KEDA scale workers 0→N→0** on queue depth — the same shape as [Argus](https://github.com/subratamondal1/argus)'s searcher fan-out. |
| **Composition, not configuration** | Seams are Python `Protocol`s — swap the LLM, the approver, or the event sink by passing a different object. No god-object to subclass. |

## Quickstart

```bash
uv add bare-agent          # or: pip install bare-agent
```

A complete agent in ~30 lines — the docstring becomes the LLM's tool description:

```python
import asyncio
from pydantic import BaseModel, Field
from bare_agent import AgentLoop, Budget, LLMClient, ToolRegistry, get_settings

registry = ToolRegistry()

class AddArgs(BaseModel):
    a: int = Field(description="first addend")
    b: int = Field(description="second addend")

@registry.tool()
async def add(args: AddArgs) -> int:
    """Add two integers and return their sum."""
    return args.a + args.b

async def main() -> None:
    settings = get_settings()          # local Ollama by default; set BARE_AGENT_MODEL for frontier
    agent = AgentLoop(
        registry=registry,
        llm=LLMClient.from_settings(settings),
        budget=Budget.from_settings(settings),
        system_prompt="You are a precise assistant. Use tools for arithmetic.",
    )
    result = await agent.run("What is 17 + 25, then add 100 to that?")
    print(result.answer)               # -> "142"
    print(result.stop_reason, result.turns, f"${result.cost_usd}")  # -> completed 3 $0.0

asyncio.run(main())
```

Run it locally for free:

```bash
ollama pull qwen3        # one-time (qwen3:30b-a3b-thinking on a 32GB Mac)
make demo                # or: uv run python examples/quickstart.py
```

## The studio

```bash
make web      # FastAPI on :8000 + Next.js studio on :3000 → http://localhost:3000/studio
```

Open `http://localhost:3000/studio`: **Add** agents and wire them into a chain, attach catalog
tools, pick a model (local qwen3 at $0 or your frontier key), and **Run** — each agent streams its
turns, tool calls, and tokens live over SSE in its own section. The backend is standalone: `make
api` runs the control plane alone, and the library works with no UI at all.

## How it works

```
user input
   │
   ▼
┌──────────────┐   answer feeds   ┌──────────────┐
│   Agent 1    │ ───────────────► │   Agent 2    │ ──────────►  final answer
│  + tools     │   the next       │  + tools     │
└──────────────┘                  └──────────────┘
   each agent = ONE hand-written loop:
   explicit messages list · 3-axis budget + cost cap · permission-gated tool dispatch

   run it:   inline over SSE      ·  or  queue → worker pool → KEDA scales 0→N→0
   keep it:  Eject ──► agent.py   (litellm + pydantic only — ZERO bare_agent dependency)
```

The loop is a **stateless reducer** over an explicit `messages: list[dict]`. That one decision pays
three ways, all for free:

- **Durability** — the list is serializable, so checkpoint it and resume after a crash.
- **Eject-to-code** — the list *is* the program; there was never a framework underneath to lift out.
- **Testability** — feed a canned `messages` list (or a fake `CompletionClient`), assert.

No metaclass magic, no hidden DAG executor, no god-object to subclass, no state trapped in a
session. Extensibility is composition: `AgentLoop(llm=..., approver=..., registry=...)`.

### The 8 primitives (each usable on its own — not a god-object)

| # | Primitive | Where |
|---|---|---|
| ① | Tool registry — `@registry.tool()` → JSON-schema → permission-gated dispatch | `registry.py` |
| ② | Prompt assembly — the explicit, serializable `messages: list[dict]` | `loop.py` |
| ③ | Agent loop — `AsyncExitStack` + 3-axis budget + termination + cycle-stop | `loop.py` |
| ④ | Retry / fallback over LiteLLM (local Ollama **or** any frontier model) | `llm.py` |
| ⑤ | State / memory — checkpoint the `messages` list (durability for free) | `loop.py` |
| ⑥ | HITL / permissions — allow / ask / deny, an `Approver` on `ask` | `registry.py` |
| ⑦ | Observability — `structlog` + an optional `EventSink` (SSE-ready) | `events.py` |
| ⑧ | Eval gate — golden replay (roadmap) | — |

## Eject

Any flow — single agent or a chain — compiles to a standalone script that imports only `litellm`
and `pydantic`. Tool sources are inlined verbatim; there is **no `bare_agent` import**:

```bash
uv run --with litellm --with pydantic agent.py "your question"
```

In the studio, **Eject to Python** shows the generated code and downloads it. The generated file is
machine-checked to compile. You can read it, diff it, vendor it, and run it after you stop using
bare-agent entirely — that is the point.

## Configuration

Settings are read by [Pydantic Settings](src/bare_agent/config.py) from the environment
(`BARE_AGENT_` prefix) or `.env` (`cp .env.example .env`). The defaults are fully local and free.
Common overrides:

| Variable | Default | Purpose |
|---|---|---|
| `BARE_AGENT_MODEL` | `ollama_chat/qwen3` | LiteLLM model id. Local Ollama by default; `anthropic/…`, `openai/…`, `gemini/…` for hosted. |
| `BARE_AGENT_OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama server, passed as `api_base` for `ollama_chat/` models. |
| `BARE_AGENT_FALLBACK_MODELS` | `[]` | Ordered fallback model ids (JSON list) for the retry ladder. |
| `BARE_AGENT_MAX_TURNS` / `…_TOKENS` / `…_WALLCLOCK_S` / `…_COST_USD` | `8` / `120000` / `180` / `0.50` | The 3-axis budget + hard cost cap; the loop stops on the first to trip. |
| `BARE_AGENT_USE_QUEUE` | `false` | Route runs through the Redis queue + worker pool (KEDA-autoscalable) instead of inline. |
| `BARE_AGENT_REDIS_URL` | `redis://localhost:6379/0` | Redis DSN for the run queue + event pub/sub (queue mode). |

For a hosted model, set `BARE_AGENT_MODEL=anthropic/…` and export that provider's key
(`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GEMINI_API_KEY`) — LiteLLM reads it from the environment.

## Development

```bash
make ci          # lock-check + format-check + lint (ruff) + compile + typecheck (ty) + tests (pytest)
make test        # the 29-test suite — hermetic (the LLM and Redis are faked; no daemon needed)
make web         # backend + studio together for local hacking
make up / down   # the Docker stack (api + studio; Ollama stays on the host)
make queue-up    # the Docker stack WITH the KEDA-shaped worker plane (+ redis + worker)
make help        # all targets
```

Kubernetes manifests live in [`k8s/`](k8s/) — an inline deploy (api + studio) and the KEDA worker
plane (redis + worker). The studio has its own toolchain ([`apps/studio/AGENTS.md`](apps/studio/AGENTS.md));
the canonical agent rules for the whole repo are in [`AGENTS.md`](AGENTS.md).

<!-- Uncomment once the repo has stars (renders an empty chart at 0):
## Star history

<p align="center">
  <a href="https://star-history.com/#subratamondal1/bare-agent&Date">
    <img src="https://api.star-history.com/svg?repos=subratamondal1/bare-agent&type=Date" width="600" alt="Star history">
  </a>
</p>
-->

## License

MIT © 2026 Subrata Mondal — see [LICENSE](LICENSE). Built as the clean, reusable extraction of
[Argus](https://github.com/subratamondal1/argus)'s agent runtime.
