Metadata-Version: 2.3
Name: klaude-code
Version: 2.26.0
Summary: Minimal code agent CLI
Requires-Dist: anthropic[bedrock]>=0.66.0
Requires-Dist: chardet>=5.2.0,<6
Requires-Dist: diff-match-patch>=20241021
Requires-Dist: filelock>=3.20.3
Requires-Dist: fastapi
Requires-Dist: google-genai>=1.56.0
Requires-Dist: markdown-it-py>=4.0.0
Requires-Dist: openai>=2.24.0
Requires-Dist: prompt-toolkit>=3.0.52
Requires-Dist: pydantic>=2.11.7
Requires-Dist: python-socks>=2.8.1
Requires-Dist: pyyaml>=6.0.2
Requires-Dist: rich>=14.1.0
Requires-Dist: sse-starlette
Requires-Dist: trafilatura>=2.0.0
Requires-Dist: typer>=0.17.3
Requires-Dist: uvicorn
Requires-Dist: readability-lxml>=0.8.4.1
Requires-Python: >=3.13
Description-Content-Type: text/markdown

# Klaude Code

Minimal code agent CLI.

## Features
- **Multi-provider**: Anthropic Message API, OpenAI Responses API, OpenRouter, ChatGPT Codex OAuth etc.
- **Keep reasoning item in context**: Interleaved thinking support
- **Model-aware tools**: Claude Code tool set for Opus, `apply_patch` for GPT-5/Codex
- **Reminders**: Cooldown-based todo tracking, instruction reinforcement and external file change reminder
- **Sub-agents**: General Purpose, Finder, Code Reviewer, Code Simplifier (+ fork-context variant)
- **Recursive `@file` mentions**: Circular dependency protection, relative path resolution
- **External file sync**: Monitoring for external edits (linter, manual)
- **Interrupt handling**: Ctrl+C preserves partial responses and synthesizes tool cancellation results
- **Output truncation**: Large outputs saved to file system with snapshot links
- **Agent Skills**: Built-in + user + project Agent Skills (with implicit invocation by Skill tool or explicit invocation by typing `//skill` or `/skill`)
- **Prompt caching**: Append-only message history maximizes prefix cache hits (cached tokens cost 10% of base input)
- **Context management**: Auto-compaction, Rewind (rollback to checkpoint), Handoff (compress and continue in fresh context)
- **Auto memory**: Persistent cross-session memory per project (`~/.klaude/projects/<project>/memory/`)
- **Web UI**: Browser-based interface via `klaude web` or `/web` slash command
- **Sessions**: Resumable with `--continue`, forkable with `/fork-session`
- **Extras**: Slash commands, sub-agents, image paste, terminal notifications, auto-theming

## Installation

```bash
uv tool install klaude-code
```

To update:

```bash
uv tool upgrade klaude-code
```

Or use the built-in command:

```bash
klaude upgrade
```

### Development Install

```bash
git clone https://github.com/inspirepan/klaude-code.git
cd klaude-code
make install    # init submodules, build web frontend, install as editable
```

Or step by step:

```bash
git submodule update --init --recursive
uv sync                              # install Python deps
uv run python scripts/build_web.py   # build web frontend
uv tool install -e .                 # install CLI globally (editable)
```

Requires `pnpm` or `npm` for the web frontend build (`pnpm` preferred).

## Usage

```bash
klaude [--model [<name>]] [--continue] [--resume [<id>]]
```

**Options:**
- `--model`/`-m`: Choose a model.
  - `--model` (no value): opens the interactive selector.
  - `--model <value>`: resolves `<value>` to a single model; if it can't, it opens the interactive selector filtered by `<value>`.
- `--continue`/`-c`: Resume the most recent session.
- `--resume`/`-r`: Resume a session.
  - `--resume` (no value): select a session to resume for this project.
  - `--resume <id>`: resume a session by its ID directly.
- `--vanilla`: Minimal mode with only basic tools (Bash, Read, Edit, Write) and no system prompts.

**Model selection behavior:**
- Default: uses `main_model` from config.
- `--model` (no value): always prompts you to pick.
- `--model <value>`: tries to resolve `<value>` to a single model; if it can't, it prompts with a filtered list (and falls back to showing all models if there are no matches).

**Debug Options:**
- `--debug`/`-d`: Enable debug mode with verbose logging and LLM trace.
- `--debug-filter`: Filter debug output by type (comma-separated).


### Configuration

#### Quick Start (Zero Config)

Klaude comes with built-in provider configurations. Just set an API key environment variable and start using it:

```bash
# Pick one (or more) of these:
export ANTHROPIC_API_KEY=sk-ant-xxx      # Claude models
export OPENAI_API_KEY=sk-xxx             # GPT models
export OPENROUTER_API_KEY=sk-or-xxx      # OpenRouter (multi-provider)
export DEEPSEEK_API_KEY=sk-xxx           # DeepSeek models
export MOONSHOT_API_KEY=sk-xxx           # Moonshot/Kimi models
export MINIMAX_API_KEY=xxx               # MiniMax models
export GOOGLE_API_KEY=xxx                # Google Gemini models (or GEMINI_API_KEY)
export EXA_API_KEY=exa-xxx               # Exa Search (optional, WebSearch provider, preferred)
export BRAVE_API_KEY=BSA-xxx             # Brave Search (optional, WebSearch provider, fallback)

# Then just run:
klaude
```

On first run, you'll be prompted to select a model. Your choice is saved as `main_model`.

You can also configure fallback lists for helper models:

```yaml
fast_model:
  - haiku
  - gemini-flash
  - gpt-5-nano

compact_model:
  - gemini-flash
  - haiku
```

Klaude tries these entries in order and uses the first available model. `fast_model` is used for session-title generation; `compact_model` is used for compact/helper tasks.

#### Built-in Providers

| Provider         | Env Variable                                                                 | Models                                                                                                   |
|------------------|------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|
| anthropic        | `ANTHROPIC_API_KEY`                                                          | sonnet, sonnet-no-thinking, opus, haiku                                                                 |
| openai           | `OPENAI_API_KEY`                                                             | gpt-5.4-high, gpt-5.4-xhigh, gpt-5.3-codex, gpt-5.3-codex-xhigh                                         |
| openrouter       | `OPENROUTER_API_KEY`                                                         | gpt-5.3-codex, gpt-5.3-codex-xhigh, gpt-5.4-high, gpt-5.4-xhigh, kimi, haiku, sonnet, sonnet-no-thinking, opus, gemini-pro, gemini-flash, grok, minimax, glm |
| google           | `GOOGLE_API_KEY` or `GEMINI_API_KEY`                                         | gemini-pro, gemini-flash                                                                                |
| google-vertex    | `GOOGLE_APPLICATION_CREDENTIALS`, `GOOGLE_CLOUD_PROJECT`, `GOOGLE_CLOUD_LOCATION` | gemini-pro, gemini-flash                                                                                |
| deepseek         | `DEEPSEEK_API_KEY`                                                           | deepseek                                                                                                 |
| moonshot         | `MOONSHOT_API_KEY`                                                           | kimi                                                                                                     |
| minimax          | `MINIMAX_API_KEY`                                                            | m2.7, m2.7:highspeed                                                                                     |
| cerebras         | `CEREBRAS_API_KEY`                                                           | glm                                                                                                      |
| claude-max       | N/A (OAuth)                                             | sonnet, sonnet-no-thinking, opus, haiku                                                                 |
| codex            | N/A (OAuth)                                                                  | gpt-5.3-codex, gpt-5.3-codex-xhigh, gpt-5.4-high, gpt-5.4-xhigh   |
| github-copilot   | N/A (OAuth)                                                                  | gpt-5.3-codex, gpt-5.3-codex-xhigh, gpt-5.4-high, gpt-5.4-xhigh, sonnet, sonnet-4.5, haiku, opus       |
| ark-api          | `ARK_API_KEY`                                                                | seed-pro, seed-code                                                                                      |
| ark-coding-plan  | `ARK_API_KEY`                                                                | seed-code, kimi                                                                                          |

List all configured providers and models:

```bash
klaude list
```

Models from providers without valid credentials are shown as dimmed/unavailable.

Bedrock is supported as a custom provider rather than a built-in one. See `docs/bedrock-setup.md`.

#### Authentication

Use the auth command to configure API keys or login to subscription-based providers:

```bash
# Interactive provider selection
klaude auth login

# Configure API keys
klaude auth login anthropic   # Set ANTHROPIC_API_KEY
klaude auth login openai      # Set OPENAI_API_KEY
klaude auth login google      # Set GOOGLE_API_KEY
klaude auth login openrouter  # Set OPENROUTER_API_KEY
klaude auth login deepseek    # Set DEEPSEEK_API_KEY
klaude auth login moonshot    # Set MOONSHOT_API_KEY
klaude auth login minimax    # Set MINIMAX_API_KEY

# OAuth login for subscription-based providers
klaude auth login codex       # ChatGPT Pro subscription
```

API keys are stored in `~/.klaude/klaude-auth.json` and used as fallback when environment variables are not set.

To logout from OAuth providers:

```bash
klaude auth logout codex
```

#### Custom Configuration

User config file: `~/.klaude/klaude-config.yaml`

Open in editor:

```bash
klaude conf
```

##### Model Configuration

You can add custom models to built-in providers or define new ones. Configuration is inherited from built-in providers by matching `provider_name`.

```yaml
# ~/.klaude/klaude-config.yaml
main_model: opus

fast_model:
  - haiku
  - gemini-flash
  - gpt-5-nano

compact_model:
  - gemini-flash
  - haiku

provider_list:
  # Add/Override models for built-in OpenRouter provider
  - provider_name: openrouter
    model_list:
      - model_name: qwen-coder
        model_id: qwen/qwen-2.5-coder-32b-instruct
        context_limit: 131072
        cost: { input: 0.3, output: 0.9 }
      - model_name: sonnet # Override built-in sonnet params
        model_id: anthropic/claude-3.5-sonnet
        context_limit: 200000

  # Add a completely new provider
  - provider_name: my-azure
    protocol: openai
    api_key: ${AZURE_OPENAI_KEY}
    base_url: https://my-instance.openai.azure.com/
    is_azure: true
    azure_api_version: "2024-02-15-preview"
    model_list:
      - model_name: gpt-4
        model_id: gpt-4-deploy-name
        context_limit: 128000
```

**Key Tips:**
- **Merging**: If `provider_name` matches a built-in provider, settings like `protocol` and `api_key` are inherited.
- **Overriding**: Use the same `model_name` as a built-in model to override its parameters.
- **Environment Variables**: Use `${VAR_NAME}` syntax for secrets.
- **Model Preference Lists**: `fast_model` and `compact_model` accept either a single string or a list of model selectors. When you provide a list, Klaude tries them in order and picks the first available one.

##### Sub-agent Model Configuration

`sub_agent_models` accepts registered sub-agent type names as keys. Current supported keys are:

- `general-purpose` - Autonomous multi-step task executor
- `general-purpose-fork-context` - Same as above but inherits parent conversation history
- `finder` - Fast codebase search and exploration
- `code-reviewer` - Identifies bugs in proposed changes
- `code-simplifier` - Refines code for clarity and consistency

If a sub-agent type is not configured, it falls back to the main agent model. Each key also accepts a list for fallback ordering.

```yaml
sub_agent_models:
  general-purpose: sonnet
  finder:
    - haiku
    - gemini-flash
  code-reviewer: opus
```

##### Supported Protocols

- `anthropic` - Anthropic Messages API
- `openai` - OpenAI Chat Completion API
- `responses` - OpenAI Responses API (for o-series, GPT-5, Codex)
- `codex_oauth` - OpenAI Codex CLI (OAuth-based, for ChatGPT Pro subscribers)
- `github_copilot_oauth` - GitHub Copilot (OAuth-based)
- `openrouter` - OpenRouter API (handling `reasoning_details` for interleaved thinking)
- `google` - Google Gemini API
- `google_vertex` - Google Vertex AI (uses GCP credentials)
- `bedrock` - AWS Bedrock for Claude (uses AWS credentials instead of api_key)

For a working Bedrock provider example, see `docs/bedrock-setup.md`.

List configured providers and models:

```bash
klaude list
```

### Cost Tracking

View aggregated usage statistics across all sessions:

```bash
# Show all historical usage data
klaude cost

# Show usage for the last 7 days only
klaude cost --days 7

# Alias for days
klaude cost --recent 7
```

### Slash Commands

Inside the interactive session (`klaude`), use these commands to streamline your workflow:

- `/...` supports mixed completion for commands + skills (command names take priority on conflicts).
- `//...` shows skill-only completion and triggers skills explicitly.

- `/copy` - Copy last assistant message to clipboard.
- `/compact` - Clear conversation history but keep a summary in context.
- `/fork-session` - Fork current session from a selected point.
- `/refresh-terminal` - Refresh terminal display.
- `/web` - Switch to web UI mode.
- `/new` - Start a new session (clears context).
- `/model` - Switch the active LLM during the session.
- `/sub-agent-model` - Configure sub-agent models at runtime.
- `/thinking` - Change thinking/reasoning level.
- `/status` - Show session usage statistics (cost, tokens, model breakdown).
- `/login` - Login to provider or configure API key.
- `/logout` - Logout from provider.
- `/continue` - Continue current session without a new user message.
- `/debug [filters]` - Toggle debug mode and configure debug filters.


### Input Shortcuts

| Key                  | Action                                      |
| -------------------- | ------------------------------------------- |
| `Enter`              | Submit input                                |
| `Shift+Enter`        | Insert newline (terminal-dependent)         |
| `Ctrl+J`             | Insert newline                              |
| `Ctrl+L`             | Open model picker overlay                   |
| `Ctrl+T`             | Open thinking level picker overlay          |
| `Ctrl+V`             | Paste image from clipboard                  |
| `Left/Right`         | Move cursor (wraps across lines)            |
| `Backspace`          | Delete character or selected text           |
| `c` (with selection) | Copy selected text to clipboard             |

### Sub-Agents

The main agent can spawn specialized sub-agents for specific tasks:

| Sub-Agent | Purpose |
|-----------|---------|
| **General Purpose** | Handle complex multi-step tasks autonomously |
| **General Purpose (Fork Context)** | Same as above, but inherits the parent agent's full conversation history |
| **Finder** | Fast codebase exploration - find files, search code, answer questions about the codebase |
| **Code Reviewer** | Identify real bugs in proposed changes |
| **Code Simplifier** | Refine recently changed code for clarity and consistency |

### Web UI

Klaude includes a browser-based interface as an alternative to the terminal TUI.

```bash
# Start web UI directly
klaude web

# With options
klaude web --port 9000 --host 0.0.0.0 --no-open
```

You can also switch from TUI to web mid-session with the `/web` slash command. The web UI provides the same capabilities as the TUI: multi-session management, file browsing, tool execution, and real-time streaming.

### Prompt Caching

Klaude is designed to maximize prefix cache hit rates across LLM API calls. Cached tokens are priced at ~10% of base input tokens, so high cache hit rates significantly reduce cost.

**Append-only message history.** The conversation history is strictly append-only. New messages, tool results, and attachments are always appended to the end of the message array, never inserted or modified in the middle. Any mutation to the head of the messages array (compressing old tool results, replacing images, reordering tool definitions) would invalidate the prefix cache and force a full re-tokenization.

Design choices that preserve prefix stability:
- **Stable system prompt**: The system prompt is composed of a static base prompt + stable tool strategy block + environment info, avoiding per-turn variation.
- **Stable JSON serialization**: Tool schemas and provider payloads use `canonicalize_json()` for deterministic key ordering across calls.
- **Cache control markers**: For Anthropic and OpenRouter (Claude models), `cache_control: {"type": "ephemeral"}` is placed on the system prompt and the last message part to hint the provider's caching boundary.
- **Compaction preserves prefix**: When context is compacted, the summary is prepended as a new first message while keeping the retained tail intact -- no existing message bytes are modified.
- **Fork-context sub-agents**: Sub-agents with `fork_context=True` inherit the parent's full system prompt and tool list to maximize prefix cache sharing.

The TUI displays cache hit rate per turn in the metadata line (e.g. `cache 12.5k (98%)`). Rates below 90% are highlighted as a warning.

### Context Management

The agent automatically manages context window limits:

- **Auto-compaction**: When the conversation approaches the model's context limit, older messages are summarized and replaced with a compact summary. The agent also recovers from context overflow errors by compacting and retrying.
- **Rewind**: The agent can roll back the conversation to a previous checkpoint (automatically inserted at key points). File system changes are preserved; only conversation history is rewound.
- **Handoff**: The agent can compress the current conversation into a summary and continue in a fresh context. Useful for very long sessions where context quality degrades.

### Auto Memory

Klaude maintains persistent memory per project across sessions. Memory files are stored in `~/.klaude/projects/<project-key>/memory/` with a `MEMORY.md` index file. The agent automatically loads relevant memories at session start and can save new memories during a session.

Memory types include user preferences, feedback/corrections, project context, and external references.

### Project Configuration Files

Klaude reads instruction files from your project directory to customize agent behavior:

| File | Purpose |
|------|---------|
| `AGENTS.md` | Project-level instructions checked into version control (shared with team) |
| `CLAUDE.md` | Personal instructions (typically gitignored) |

These files are loaded automatically and injected into the system prompt. They can be placed at the project root or in subdirectories for scoped instructions.
