Metadata-Version: 2.4
Name: caudate-cli
Version: 0.1.21
Summary: A local-first cognitive agent with a learned router (Caudate) and Claude-SDK-shaped tool palette.
Author-email: Rave Manji <rahimtz93@googlemail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/raveuk/cognos
Project-URL: Repository, https://github.com/raveuk/cognos
Keywords: llm,agent,agentic,cognitive,local-first,tool-use
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: POSIX :: Linux
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: litellm>=1.40.0
Requires-Dist: chromadb>=0.5.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: SQLAlchemy>=2.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: click>=8.0.0
Requires-Dist: prompt_toolkit>=3.0.0
Requires-Dist: fastapi>=0.110.0
Requires-Dist: uvicorn>=0.29.0
Requires-Dist: python-multipart>=0.0.9
Requires-Dist: mcp>=1.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: huggingface_hub>=0.20.0
Requires-Dist: anthropic>=0.40.0
Requires-Dist: pypdf>=4.0.0
Requires-Dist: useful-moonshine-onnx>=0.2.0
Requires-Dist: kokoro>=0.3.0
Requires-Dist: soundfile>=0.12.0
Requires-Dist: piper-tts>=1.4.0
Requires-Dist: sounddevice>=0.5.0
Requires-Dist: webrtcvad-wheels>=2.0.10
Requires-Dist: diffusers>=0.30.0
Requires-Dist: transformers>=4.40.0
Requires-Dist: torch>=2.2.0
Requires-Dist: accelerate>=0.30.0
Requires-Dist: Pillow>=10.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Dynamic: license-file

# Caudate

A local-first cognitive agent with Claude-SDK feature parity, built on
Ollama via LiteLLM. Caudate runs entirely on your hardware by default — no
API keys, no network calls — but switches to Anthropic, OpenAI, or any
LiteLLM-supported provider with a one-line config change.

It's not a thin chat wrapper. The architecture explicitly separates:

- **Memory** — episodic, semantic, procedural, working
- **Planning** — DAG-based goal decomposition with replanning
- **Reflection** — meta-learning from past goal outcomes
- **Personality** — identity, mood, inner voice
- **Dual-process routing** — fast/slow models picked per call (System 1 / System 2)

…with a Claude-Code-style agentic loop on top: real-time tool calls,
streaming, sessions, hooks, MCP, subagents, permissions, and a
fully-featured CLI + HTTP API.

> Status: feature-complete against its original five-phase roadmap plus
> Claude SDK extras and Claude Code UX parity. See
> [`NEXT_ACTIONS.md`](NEXT_ACTIONS.md) for what's done and what's deferred.

---

## Quickstart

### Install from PyPI

```bash
pipx install caudate-cli
```

One install gets you everything: dual-brain routing, the Caudate NN
router, voice (Moonshine STT + Kokoro/Piper TTS), image generation
(diffusers + FLUX/SDXL), PDF extraction, native Anthropic SDK, and the
full tool palette. Heavy — pulls torch, transformers, diffusers — but
you don't have to think about extras.

Vision works out of the box: `DescribeImage` routes through whichever
vision-capable model you pick (`qwen3-vl`, `glm-5v`, `claude-haiku-4-5`,
GPT-4V, …) so the dependency is your LLM choice, not a separate install.

On first launch, `caudate` runs a one-time setup wizard that picks your
fast/slow models, downloads Caudate's weights from HuggingFace, and writes
`~/.caudate/settings.json`. After that:

```bash
caudate               # banner + REPL
caudate doctor        # diagnose what's wired (Ollama, Caudate, API keys)
caudate init --force  # re-run the wizard if you change your mind
```

Requirements:

- Python ≥ 3.10
- [Ollama](https://ollama.com) running locally if you want the local-only or
  hybrid preset (skip if you go hosted-only)
- An `ANTHROPIC_API_KEY` in your shell *only* if you pick a preset that uses an
  `anthropic/...` model

### Install from source (for development)

```bash
git clone https://github.com/raveuk/caudate-cli.git
cd caudate-cli
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
caudate init
```

### Talk to it

```bash
caudate                                       # default — drops into REPL
caudate interactive --model fast              # preset model
caudate interactive \
    --system1 ollama/qwen2.5-coder:1.5b \
    --system2 ollama/gemma3:27b              # explicit dual-brain
```

A REPL opens. Type to chat. Type `/help` for slash commands.

### 4. Or hit it over HTTP

```bash
caudate serve --port 8000
# in another terminal:
curl -X POST http://127.0.0.1:8000/chat \
    -H 'content-type: application/json' \
    -d '{"message":"what is in this directory?"}'
```

The HTTP server also hosts a [Web UI](#web-ui) at `http://127.0.0.1:8000/ui`.

### Use Caudate as the backend for Open WebUI

Open WebUI is a polished chat UI that talks to any OpenAI-compatible
endpoint. Caudate exposes both Anthropic-shape (`/v1/messages`) and
OpenAI-shape (`/v1/chat/completions`) endpoints, so it slots in cleanly:

```bash
# 1. Start Caudate's API server
caudate serve --port 8000

# 2. In Open WebUI's settings → Connections → OpenAI API
#    Base URL: http://localhost:8000/v1
#    API Key:  any non-empty string (Caudate ignores it)

# 3. Pick "claude-haiku-4-5", "claude-opus-4-7[1m]", or whichever
#    model id is wired in your ~/.caudate/settings.json
```

Anything you type in Open WebUI now goes through Caudate's dual-brain
routing, Caudate the NN router, and the full tool palette. Vision works
the same way — drop an image into Open WebUI's chat and Caudate routes
it to whichever vision-capable model you've configured.

For voice (`caudate talk`), image generation (`caudate draw`), and the
Forge autonomous-coding harness, run those commands directly — Open
WebUI doesn't surface them.

---

## What you get out of the box

### CLI

| Command                           | What it does |
| --------------------------------- | ------------ |
| `caudate interactive`              | REPL with streaming, slash commands, history, multi-line input |
| `caudate run <goal>`               | Single-shot DAG planner — decomposes a goal, runs it, reflects |
| `caudate talk`                     | Voice mode (Moonshine STT + Kokoro TTS, Whisper/Piper fallback) |
| `caudate draw "<prompt>"`          | Generate an image (diffusers / FLUX.1-schnell or SDXL-Turbo) |
| `caudate caudate {train,eval,status,export}` | Train/inspect Caudate, the learned router/advisor NN |
| `caudate serve [--port 8000]`      | FastAPI HTTP server with SSE streaming |
| `caudate sessions {list,delete,rename,export}` | Manage saved conversations |
| `caudate personality {show,set,reset}` | Inspect or tune identity / mood |
| `caudate models`                   | List detected Ollama models with capability flags |
| `caudate router`                   | Preview routing decisions without calling the LLM |
| `caudate bench`                    | Run the benchmark suite |
| `caudate cron {add,list,remove,run}` | Schedule recurring prompts |
| `caudate mcp-serve`                | Run Caudate as an MCP server |
| `caudate update`                   | Self-update (git pull or pip upgrade) |
| `caudate info`                     | List registered tools and learned strategies |

### Slash commands (inside the REPL)

`/help`, `/clear`, `/compact`, `/model <id|fast|balanced|powerful>`,
`/cost`, `/tools`, `/sessions`, `/export <md|json|html>`, `/files`,
`/permissions <mode>`, `/personality`, `/router`, `/diff <path>`,
`/status`, `/cron`, `/bg`, `/notify`, `/think on|off`, `/save`,
`/quit`. Type `/help` for the full list with descriptions.

### Tools the agent can call

~38 built-in tools, including: `Bash`, `Read`, `Write`, `Edit`, `Glob`,
`Grep`, `WebSearch`, `WebFetch`, `PythonExec`, `Think`, `Respond`,
`Agent` (subagents), `Draw`, `EditImage`, `DescribeImage`, `Speak`,
`TranscribeAudio`, `Storyboard`, `Sandbox`, `Calculator`, `DateTime`,
`HttpRequest`, `OpenAPI`, `Notebook`, `Cron`, `PushNotification`,
`AskUserQuestion`, `LoadSkill`, `UpdateMemory`, `MCP`, `Worktree`,
`PlanMode`, `FindAnywhere`, `SemanticSearch`, `SystemInfo`, `Task`,
`CognosCard`, `Artifact`, `Agentic`. Drop a `plugins/*.py` exposing
`PLUGIN = ToolInstance` to add your own.

### Caudate — the learned brain

Caudate ships with **Caudate**, a small PyTorch transformer that learns
your tool-use patterns turn-by-turn. It observes every conversation,
auto-trains in the background once it has enough samples, and graduates
through trust levels (SILENT → OBSERVER → WHISPER → ADVISOR → CONTROLLER)
based on rolling accuracy. At WHISPER it whispers a hint into the LLM
prompt; at ADVISOR it can override tier routing.

See `CAUDATE.md` for the full architecture, `nn/` for the code,
`data/nn/` for the live checkpoint and replay buffer.

### Multi-modal in / out

- **`@file` references** — `look at @config.py` inlines or attaches the file.
- **Drag-and-drop images / PDFs** — paths in the prompt are auto-uploaded via the Files API.
- **`POST /files`** — same Files API exposed over HTTP.
- **Citations** — pass `documents=[{id,title,text}]` and the model can emit `[[cite:doc:Lx]]` markers, post-processed into structured `CitationBlock` objects.

---

## Architecture

```
                ┌──────────────────────────────────────────────────┐
                │                 CognosAgent                       │
                │                                                   │
   user input  ─┼─►  AgenticLoop  ◄──►  Executor  ──►  tools/      │
                │       │                  ▲                        │
                │       │                  │                        │
                │       ▼                  │                        │
                │   Personality ─► hooks ──┘                        │
                │       │                                           │
                │       ▼                                           │
                │     LLM Router (DualLLMProvider)                  │
                │     ├── System 1: fast model                      │
                │     └── System 2: slow model                      │
                │                                                   │
                │   Memory: episodic | semantic | procedural | working
                │   Session persistence + context compaction        │
                │   Permissions (modes + allow/deny rules + audit)  │
                │   MCP clients (cognos_mcp/)                       │
                │   Subagents (workspace-isolated via git worktrees)│
                └──────────────────────────────────────────────────┘
```

Each subsystem is documented in `BUILD_LOG.md`. The
[Claude SDK Extras](NEXT_ACTIONS.md#claude-sdk-extras-done) and
[Claude Code UX Parity](NEXT_ACTIONS.md#claude-code-ux-parity-done) sections
in `NEXT_ACTIONS.md` enumerate what's wired and where.

---

## Configuration

Three layers, last wins:

1. Built-in defaults in `core/settings.py`
2. `~/.caudate/settings.json` — per-user
3. `./.caudate/settings.json` — per-project

Example:

```json
{
  "model": "ollama/gemma3:27b",
  "permission_mode": "default",
  "fallback_models": ["ollama/qwen2.5-coder:1.5b"],
  "permissions": {
    "allow": [{"tool": "Bash", "pattern": "^(ls|cat|grep)"}],
    "deny":  [{"tool": "Bash", "pattern": "rm -rf"}]
  },
  "statusline": "{model} | {mood} | tok={tokens} | ${cost:.4f}",
  "notifications": {"enabled": true, "on_long_task_seconds": 30}
}
```

CLI flags always override settings (`--model fast`, `--permissions plan`).

---

## Web UI

A zero-build single-page UI ships with the HTTP server:

```bash
caudate serve --port 8000
# open http://127.0.0.1:8000/ui
```

It speaks to `POST /chat/stream` (SSE), supports session resume,
file attachments, and slash-style commands. Source: `ui/web/`.

---

## IDE plugins

- `ide/vscode/` — TypeScript extension. Sidebar webview, "Ask about
  selection" right-click, configurable API URL / model / permission mode.
- `ide/jetbrains/` — Kotlin plugin for IntelliJ-platform IDEs (IDEA,
  PyCharm, GoLand, WebStorm, RustRover, …). Tool window, editor action,
  settings page.

Both are thin clients — they make HTTP calls to a running `caudate serve`
process, no LLM runs in the IDE.

---

## Optional extras

`pip install`-flagged features that are no-ops without their dep:

| Extra        | Unlocks |
| ------------ | ------- |
| `anthropic`  | Real prompt caching, native extended thinking, native `response_format` for `claude-*` model ids |
| `pypdf`      | PDF text extraction in the Files API |
| `prompt_toolkit` | Multi-line input + persistent history + Ctrl+R + slash completion |
| `fastapi` + `uvicorn` | The HTTP server (`caudate serve`) |
| `mcp`        | The MCP server / client (`caudate mcp-serve`) |
| `useful-moonshine-onnx` + `kokoro` + `piper-tts` + `sounddevice` | Voice mode (`caudate talk`) |
| `diffusers` + `transformers` + `torch` | Image generation (`caudate draw`) |
| `torch` + `sentence-transformers` | Caudate (the learned router NN) |

Caudate runs without any of them — they degrade gracefully.

---

## Project layout

```
core/             agent, agentic loop, sessions, hooks, permissions, files,
                  citations, settings, slash commands, …
execution/        tool registry + 12 built-in tools + plugin loader
llm/              LiteLLM provider, model registry, dual-process router,
                  fallback chains
memory/           episodic / semantic / procedural / working
planning/         DAG planner, task graph
reflection/       reflector, meta-learner
personality/      identity, mood, inner voice
cognos_mcp/       MCP server, client, bridge
api/              FastAPI HTTP server
bench/            benchmark suite
plugins/          drop-in tools (`PLUGIN = ToolInstance`)
ide/vscode/       VS Code extension
ide/jetbrains/    JetBrains plugin
ui/               terminal display + web UI
data/             local state — sessions, files, manifests, audit log
```

---

## Why local-first?

Three reasons:

1. **Privacy.** Code, conversations, and learned strategies live on disk.
2. **Cost.** A small Ollama model runs at $0/turn and answers in milliseconds for routine work.
3. **Sovereignty.** No vendor outage takes you offline; no rate limit slows you down.

The dual-process router exists so you can keep most turns on a small
local model and only escalate hard turns to a heavy one (which can
itself be local — or Anthropic/OpenAI when you're online).

---

## License

MIT — see [LICENSE](LICENSE). © 2026 Rave Manji.
