Metadata-Version: 2.4
Name: mnemoai-assistant
Version: 0.5.0
Summary: Mnemo AI — a local agentic AI assistant (LangGraph + MCP) that learns and remembers, with multi-provider model support.
Project-URL: Homepage, https://github.com/brunopistone/mnemoai
Project-URL: Repository, https://github.com/brunopistone/mnemoai
Project-URL: Issues, https://github.com/brunopistone/mnemoai/issues
Author: Bruno Pistone
License-Expression: MIT
License-File: LICENSE
Keywords: agent,assistant,bedrock,cli,langgraph,llm,mcp,ollama,rag
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: aws-bedrock-token-generator>=1.1.0
Requires-Dist: boto3>=1.42.78
Requires-Dist: brave-search-python-client>=0.4.27
Requires-Dist: chromadb>=1.5.5
Requires-Dist: crawl4ai>=0.8.6
Requires-Dist: faiss-cpu>=1.13.2
Requires-Dist: httpx>=0.28.1
Requires-Dist: langchain-anthropic>=1.4.4
Requires-Dist: langchain-aws>=1.4.1
Requires-Dist: langchain-core>=1.2.23
Requires-Dist: langchain-litellm>=0.3.5
Requires-Dist: langchain-ollama>=1.0.1
Requires-Dist: langchain-openai>=1.1.12
Requires-Dist: langgraph>=1.1.3
Requires-Dist: litellm>=1.81.1
Requires-Dist: mcp[cli]>=1.26.0
Requires-Dist: mem0ai>=1.0.9
Requires-Dist: mypy-boto3-sagemaker-runtime>=1.42.54
Requires-Dist: numpy>=2.4.4
Requires-Dist: ollama>=0.6.1
Requires-Dist: openai>=2.30.0
Requires-Dist: prompt-toolkit>=3.0.52
Requires-Dist: psutil>=7.2.2
Requires-Dist: pygments>=2.20.0
Requires-Dist: pypdf2>=3.0.1
Requires-Dist: python-docx>=1.2.0
Requires-Dist: qdrant-client>=1.17.1
Requires-Dist: tiktoken>=0.12.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.6.0; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/brunopistone/mnemoai/main/images/mnemoai-logo.png" alt="Mnemo AI" width="120">
</p>

<h1 align="center">Mnemo AI</h1>

[![PyPI](https://img.shields.io/pypi/v/mnemoai-assistant.svg)](https://pypi.org/project/mnemoai-assistant/)
[![Python Version](https://img.shields.io/badge/python-3.11%2B-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

A local agentic AI assistant with MCP (Model Context Protocol) integration, RAG capabilities, and intelligent conversation management. Built on LangGraph with LangChain for multi-provider LLM support (Ollama, Amazon Bedrock, OpenAI, Anthropic, Amazon SageMaker AI, LiteLLM).

![Demo](https://raw.githubusercontent.com/brunopistone/mnemoai/main/images/assistant-demo.gif)

## 📑 Table of Contents

- [✨ Key Features](#-key-features)
- [📖 Project Structure](#-project-structure)
- [🏗️ Architecture](#️-architecture)
  - [High-Level Overview](#high-level-overview)
  - [Component Breakdown](#component-breakdown)
  - [Data Flow](#data-flow)
  - [Session Management](#session-management)
- [🚀 Quick Start](#-quick-start)
  - [Prerequisites](#prerequisites)
  - [Installation](#installation)
  - [Optional: Create System Command](#optional-create-system-command)
- [🔀 Feature Toggles](#-feature-toggles)
- [💡 Usage](#-usage)
  - [Basic Chat](#basic-chat)
  - [Commands](#commands)
  - [Keyboard Shortcuts](#keyboard-shortcuts)
  - [Verbose Mode](#verbose-mode)
- [🚀 Productivity Tools](#-productivity-tools)
  - [📋 Todo List Management](#-todo-list-management)
  - [🔎 Fast Search Tools](#-fast-search-tools)
  - [✏️ Precise File Editing](#️-precise-file-editing)
  - [🛡️ Enhanced Error Handling](#️-enhanced-error-handling)
  - [🔐 Action Confirmation (bash & file writes)](#-action-confirmation-bash--file-writes)
  - [🛡️ Git Safety](#️-git-safety)
  - [📝 Plan Mode](#-plan-mode)
  - [🔄 Background Tasks](#-background-tasks)
- [🔧 Configuration](#-configuration)
  - [Model Configuration](#model-configuration)
  - [Vision Model Configuration](#vision-model-configuration)
  - [Model Parameters](#model-parameters)
  - [General Parameters](#general-parameters)
  - [Embeddings Configuration](#embeddings-configuration)
  - [LLM Interaction Configuration](#llm-interaction-configuration)
  - [System Prompt](#system-prompt)
  - [RAG Configuration](#rag-configuration)
  - [Episodic Memory Configuration](#episodic-memory-configuration)
- [📚 Advanced Features](#-advanced-features)
  - [Query Routing](#query-routing)
  - [Orchestrator-Workers](#orchestrator-workers)
  - [Web Search Configuration](#web-search-configuration)
  - [Web Crawler Configuration](#web-crawler-configuration)
  - [External MCP Servers](#external-mcp-servers)
  - [RAG (Retrieval-Augmented Generation)](#rag-retrieval-augmented-generation)
  - [User Profile Learning](#user-profile-learning)
  - [Episodic Memory](#episodic-memory)
  - [ACE Playbook (Agentic Context Engineering)](#ace-playbook-agentic-context-engineering)
  - [Training Data Collection](#training-data-collection)
- [📦 Dependencies](#-dependencies)
- [🛠️ Development](#️-development)
  - [Testing](#testing)
  - [Adding New Tools](#adding-new-tools)
  - [Adding New File Readers](#adding-new-file-readers)
  - [Switching Model Providers](#switching-model-providers)
  - [Adding New Model Providers](#adding-new-model-providers)
- [🔧 Ollama Utilities (Optional)](#-ollama-utilities-optional)
  - [Ollama Environment Setup (macOS)](#ollama-environment-setup-macos)
  - [VRAM Cleaner](#vram-cleaner)
- [🐛 Troubleshooting](#-troubleshooting)
  - [Common Issues](#common-issues)
  - [Logging](#logging)
- [📄 License](#-license)
- [🤝 Contributing](#-contributing)
- [🙏 Acknowledgments](#-acknowledgments)

## ✨ Key Features

- **🤖 Multi-Model Support**: Ollama (local), Amazon Bedrock, OpenAI, Anthropic (Claude), Amazon SageMaker AI, LiteLLM (100+ providers)
- **🔧 MCP Tool System**: Extensible tool architecture via Model Context Protocol
- **📚 RAG (Retrieval-Augmented Generation)**: Automatic document indexing and semantic search (_if enabled_)
- **💬 Advanced Chat Interface**: Multiline input, command system, conversation save/load
- **🧠 User Profile Learning**: Automatic learning from interactions for personalized responses
- **🧩 Episodic Memory**: Learns from successful task completions and retrieves similar solutions
- **📖 ACE Playbook**: Learns strategies from successes AND failures via Agentic Context Engineering
- **📊 Training Data Collection**: Mark high-quality responses for SFT training
- **🔍 Web Search**: Integrated Brave Search API (_if available_)
- **🌐 Web Crawler**: Extract and index content from web pages
- **🖼️ Vision Support**: Image analysis with vision models (_if available_)
- **📁 File Operations**: Read/write/edit with support for text, CSV, JSON, PDF, DOCX
- **✏️ Precise File Editing**: Safe string replacement with validation and uniqueness checking
- **🔎 Fast Search Tools**: Glob pattern matching and ripgrep content search (10-100x faster)
- **📋 Todo Tracking**: Multi-step task management with real-time progress updates
- **⚡ Bash Execution**: Direct shell command execution with intelligent error handling
- **🛡️ Git Safety**: Protection against dangerous git operations with smart warnings
- **📝 Plan Mode**: Implementation planning workflow for complex tasks
- **🔄 Background Tasks**: Run long operations in parallel without blocking

## 📖 Project Structure

```yaml
mnemoai/                      # repo root
├── pyproject.toml                          # Packaging + `mnemoai` CLI entry point
├── requirements.txt                        # Dependencies
├── README.md                               # This file
├── pytest.ini                              # Pytest configuration
├── requirements-dev.txt                    # Dev/test dependencies
│
├── src/mnemoai/              # The single package (src layout)
│   ├── __init__.py
│   ├── __main__.py                         # `python -m mnemoai`
│   ├── main.py                             # Entry point (cli())
│   │
│   ├── client/                             # Client layer
│   │   ├── client.py                       # LangGraphClient facade (lifecycle, MCP, query)
│   │   ├── mcp_tool_wrapper.py             # MCP→LangChain adapter + MultiMCPClient (built-in + external servers)
│   │   ├── mcp_config.py                   # Loads external MCP servers from mcp.json
│   │   ├── agent/                          # Agent loop
│   │   │   ├── agent.py                    # LangGraph StateGraph agent with streaming
│   │   │   ├── router.py                   # Query classifier and routing
│   │   │   ├── orchestrator.py             # Task decomposition and worker orchestration
│   │   │   └── reasoning_utils.py          # Reasoning/thinking helpers for aux LLM calls
│   │   ├── ui/                             # User interface
│   │   │   ├── chat_interface.py           # Chat loop
│   │   │   └── spinner.py                  # Loading animations
│   │   ├── managers/                       # Business logic
│   │   │   ├── agent_conversation_manager.py  # Conversation state and token tracking
│   │   │   └── user_profile_manager.py     # User profiling and learning
│   │   └── memory/                         # Memory systems
│   │       ├── episodic_memory.py          # Episodic memory manager
│   │       ├── reflector.py                # ACE Reflector - extracts strategies
│   │       ├── playbook_store.py           # ACE Playbook - stores learned strategies
│   │       ├── faiss_store.py              # FAISS episodic store
│   │       └── chroma_store.py             # ChromaDB episodic store
│   │
│   ├── server/                             # MCP server layer
│   │   ├── server.py                       # FastMCP server (run as a subprocess)
│   │   ├── error_handler.py                # @tool_error_handler decorator (shared)
│   │   └── tools/                          # Tool implementations
│   │       ├── tools_manager.py            # Tool registration
│   │       ├── fs_read.py / fs_write.py / file_edit.py / file_search.py
│   │       ├── execute_bash.py / git_safety.py / todo_manager.py / plan_mode.py
│   │       ├── background_tasks.py / web_crawler.py / web_search.py
│   │       ├── describe_image.py / rag_tool.py
│   │       ├── rag/                        # RAG system (session, vector_store_controller, stores)
│   │       └── readers/                    # File readers (csv/json/pdf/docx/line/dir/search + chunking)
│   │
│   ├── models/                             # Model layer
│   │   ├── provider_params.py              # Single source of truth: per-provider config keys
│   │   ├── mantle_factory.py               # Bedrock Mantle model factory (multi-protocol)
│   │   ├── controllers/                    # Provider-dispatching controllers
│   │   │   ├── base_model_controller.py    # Minimal shared base
│   │   │   ├── llm_controller.py           # LLM initialization
│   │   │   ├── vision_model_controller.py  # Vision model initialization
│   │   │   └── embeddings_controller.py    # Embeddings initialization
│   │   └── chat_models/                    # Concrete LangChain ChatModel subclasses
│   │       ├── chat_ollama_wrapper.py      # Ollama model with penalty support
│   │       └── sagemaker_chat.py           # SageMaker ChatModel for LangChain
│   │
│   └── utils/                              # Utilities
│       ├── config.py                       # Config loader
│       ├── configurator.py                 # First-run setup + /config & /model flows
│       ├── paths.py                        # Central path helper (~/.mnemoai)
│       ├── logger.py                       # Logging utilities
│       ├── bm25.py                         # Lightweight BM25 (hybrid search)
│       ├── config.yaml.example             # Config templates (also .bedrock / .bedrock.mantle)
│       ├── mcp.json.example                # External MCP servers template
│       └── formatting/                     # Text formatting (code/url/response)
│
├── tests/                                  # Test suite (pytest)
│   ├── conftest.py                         # Puts src/ on sys.path
│   ├── unit/                               # Fast, deterministic, no deps
│   └── integration/                        # Live agent + Ollama + MCP
│
├── docs/                                   # ARCHITECTURE.md (detailed file map)
└── bash/                                   # Helper scripts
    ├── system-command-app/                 # `mnemoai` wrapper script
    ├── ollama-freeup-vram/                 # VRAM management
    └── ollama-env-mac/                     # Ollama config
```

## 🏗️ Architecture

### High-Level Overview

```
┌─────────────────────────────────────────────────────────────┐
│                         main.py                             │
│                    (Application Entry)                      │
└─────────────────────────────┬───────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              │                               │
              ▼                               ▼
      ┌─────────────────┐            ┌──────────────────┐
      │ LangGraphClient │◄──────────►│  MCP Server      │
      │  (client.py)    │            │  (server.py)     │
      └────────┬────────┘            └────────┬─────────┘
               │                              │
          ┌────┴─────┐                        ▼
          │          │                   ┌──────────┐
          ▼          ▼                   │  Tools   │
      ┌────────┐ ┌──────────┐            └────┬─────┘
      │  UI    │ │ Managers │                 │
      └────────┘ └──────────┘            ┌────┴────┐
          │          │                   │         │
          └────┬─────┘                   ▼         ▼
               ▼                    ┌──────────┐ ┌─────┐
          ┌──────────┐              │ Readers  │ │ RAG │
          │LangGraph │              └──────────┘ └─────┘
          │  Agent   │
          └──────────┘
```

## 🚀 Quick Start

### Prerequisites

**Required:**

- Python 3.11+
- At least **one LLM provider** configured and accessible (see below)

**LLM Providers (choose at least one):**

| Provider                                            | Requirements                                                                      |
| --------------------------------------------------- | --------------------------------------------------------------------------------- |
| **Ollama** (local, recommended for getting started) | [Install Ollama](https://ollama.ai), then pull a model: `ollama pull qwen3:4b`    |
| **Amazon Bedrock**                                  | AWS CLI configured (`aws configure`) with Bedrock access in your region           |
| **Amazon SageMaker AI**                             | AWS CLI configured with a deployed SageMaker endpoint                             |
| **OpenAI**                                          | Set `OPENAI_API_KEY` environment variable                                         |
| **Anthropic** (Claude API)                          | Set `ANTHROPIC_API_KEY` environment variable                                      |
| **LiteLLM**                                         | Depends on the underlying provider (see [LiteLLM docs](https://docs.litellm.ai/)) |

**Optional:**

- **ripgrep** — 10-100x faster content search (see installation below)
- **Embedding model** — Required if you enable RAG, Episodic Memory, or ACE Playbook (see [Feature Toggles](#-feature-toggles))
- **Vision model** — Required for image analysis (`describe_image` tool)
- **Brave Search API key** — Required for web search ([get one here](https://brave.com/search/api/))

### Installation

#### Recommended: install from PyPI

The published package is **[`mnemoai-assistant`](https://pypi.org/project/mnemoai-assistant/)** (the import name and the CLI command are both `mnemoai`). No clone needed — install it into an isolated environment and get the `mnemoai` command on your PATH:

```bash
uv tool install mnemoai-assistant     # or: pipx install mnemoai-assistant
```

Or into the current environment with pip:

```bash
pip install mnemoai-assistant
```

Then configure a user config (see step 4 below) and run:

```bash
mnemoai            # verbose (shows thinking)
mnemoai --no-verbose
```

To upgrade: `uv tool upgrade mnemoai-assistant` (or `pip install -U mnemoai-assistant`). To remove: `uv tool uninstall mnemoai-assistant`.

> This is the best choice if you just want to use the assistant. Install from a checkout (below) instead if you plan to edit the source.

#### Install from a checkout

1. **Clone the repository**:

```bash
git clone https://github.com/brunopistone/mnemoai.git
cd mnemoai
```

2. **Install the assistant** (choose one):

#### Option 1: install as a CLI command (`uv tool install`)

This installs the project into its own isolated environment and puts `mnemoai` on your PATH, so you can run it from any directory (macOS and Linux) without activating anything:

```bash
uv tool install .        # or: pipx install .
```

Then configure a user config (see step 4) and run:

```bash
mnemoai            # verbose (shows thinking)
mnemoai --no-verbose
```

To upgrade after pulling changes: `uv tool install --force .`. To remove: `uv tool uninstall mnemoai`.

> Pick "run from a checkout" below instead if you plan to actively edit the code, since that runs your working tree directly with no reinstall step.

#### Option 2: run from a checkout

Set up an environment (choose one), which lets you run the assistant directly from the repo while editing the source live. Because the code uses a `src/` layout, run it as a module with `src/` on the path:

```bash
PYTHONPATH=src python -m mnemoai            # verbose
PYTHONPATH=src python -m mnemoai --no-verbose
```

(Or `pip install -e .` once, then just `mnemoai`.)

**Option A: venv**

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

**Option B: uv**

```bash
uv venv
uv pip install -r requirements.txt
```

**Option C: conda**

```bash
conda create -n mnemoai python=3.11
conda activate mnemoai
pip install -r requirements.txt
```

**Get the `mnemoai` command for a checkout install**

So you don't have to `cd` into the repo every time, symlink the bundled wrapper script onto your PATH. It activates the project environment, then runs the app (`PYTHONPATH=src python -m mnemoai`):

```bash
chmod +x bash/system-command-app/mnemoai-wrapper.sh
ln -sf "$(pwd)/bash/system-command-app/mnemoai-wrapper.sh" /usr/local/bin/mnemoai
```

Now `mnemoai` works from any directory and always reflects your latest edits. The wrapper auto-activates a project-local `.venv` (Options A and B) if present, otherwise it falls back to a conda env named `mnemoai` (Option C) — edit the script if your environment differs.

3. **Install ripgrep (optional but recommended for fast search)**:

Ripgrep provides 10-100x faster content search than traditional grep. Required for `grep_search` tool.

**macOS:**

```bash
brew install ripgrep
```

**Ubuntu/Debian:**

```bash
sudo apt install ripgrep
```

**Fedora/RHEL:**

```bash
sudo dnf install ripgrep
```

**Windows (via Chocolatey):**

```bash
choco install ripgrep
```

**From source:**

```bash
cargo install ripgrep
```

**Verify installation:**

```bash
rg --version  # Should show ripgrep version
```

If ripgrep is not installed, the assistant will automatically fall back to using `execute_bash` with standard `grep`, but performance will be significantly slower.

4. **Configure the application**:

**First-run setup (easiest).** If you start the assistant and no config is found, an interactive configurator runs automatically. It walks you through: the LLM provider (Ollama / Bedrock / Mantle / OpenAI / Anthropic / Amazon SageMaker AI / LiteLLM) plus chat model, connection details (Ollama host/port; AWS region; for Mantle the API protocol — chat_completions / responses / anthropic; SageMaker region + input format; LiteLLM API base/key; OpenAI uses `OPENAI_API_KEY`; Anthropic uses `ANTHROPIC_API_KEY` with an optional base URL), optional max output tokens (blank or `none` uses the provider default), and a mandatory max context window (defaults to 65536); the vision model (reusing the chat model's host/region, with its own Mantle protocol and optional max output tokens); your profile name; an optional Brave Search key; and each feature toggle (RAG, episodic memory, ACE playbook, web crawler, query routing, orchestration, user profiling). Every prompt is pre-filled with the template's default, so you can press Enter through the ones you don't care about. It then writes a ready-to-use `~/.mnemoai/config/config.yaml` from the matching template. Just run:

```bash
mnemoai      # or, from a checkout: PYTHONPATH=src python -m mnemoai
```

and follow the prompts. You can re-edit the generated file any time to fine-tune models, prompts, and feature toggles.

**Manual setup.** Prefer to write it yourself? Copy a template (they live inside the package, under `src/mnemoai/utils/`):

```bash
cp src/mnemoai/utils/config.yaml.example src/mnemoai/utils/config.yaml
```

Edit that `config.yaml` with your settings. This file is git-ignored to protect your API keys. At minimum, configure your LLM provider.

The config file is resolved in this order (first match wins):

1. `$MNEMOAI_CONFIG` — explicit path (handy for switching between provider configs)
2. `~/.mnemoai/config/config.yaml` — **user config used by the installed `mnemoai` command**
3. `~/.mnemoai/config.yaml` — legacy pre-subfolder location (still read if present)
4. `<package>/utils/config.yaml` — package-relative fallback (used when running from a checkout)

On first run mnemoai seeds `~/.mnemoai/config/` and `~/.mnemoai/mcp/` with copies of the bundled examples (`config.yaml*.example`, `mcp.json.example`) so you have them to read right next to your live files. If you installed the CLI with `uv tool install` (the recommended option), put your config in the user location:

```bash
# Examples are auto-copied on first run; just copy one to config.yaml and edit:
cp ~/.mnemoai/config/config.yaml.example        ~/.mnemoai/config/config.yaml
# or, for Bedrock / Mantle:
# cp ~/.mnemoai/config/config.yaml.bedrock.example        ~/.mnemoai/config/config.yaml
# cp ~/.mnemoai/config/config.yaml.bedrock.mantle.example ~/.mnemoai/config/config.yaml
```

At minimum, configure your LLM provider:

**For Ollama (quickest setup):**

```bash
# Pull a model first
ollama pull qwen3:4b
```

```yaml
# utils/config.yaml (minimal)
MODEL_ID:
  NAME: qwen3:4b
  TYPE: ollama
  HOST: localhost
  PORT: 11434
  TEMPERATURE: 0.6

# Profile name (used for session data isolation)
PROFILE:
  NAME: default

# Everything else can be left at defaults or disabled
ENABLE_RAG: false
ENABLE_EPISODIC_MEMORY: false
ENABLE_PLAYBOOK: false
ENABLE_WEB_SEARCH: false
ENABLE_WEB_CRAWL: false
```

See [Configuration](#-configuration) for all options and [Feature Toggles](#-feature-toggles) for enabling advanced features.

5. **Run the assistant**:

If you installed with `uv tool install` (recommended), run the command from anywhere:

```bash
mnemoai
```

If you set up a checkout and symlinked the wrapper, the same command works. Otherwise, run it from the repo directory:

```bash
PYTHONPATH=src python -m mnemoai
```

See `bash/system-command-app/README.md` for details on the wrapper script.

## 🔀 Feature Toggles

All advanced features can be independently enabled or disabled in your local `utils/config.yaml` (copied from `config.yaml.example`). Here is a quick reference:

| Feature                                                  | Config Key                         | Default             | Dependencies                              |
| -------------------------------------------------------- | ---------------------------------- | ------------------- | ----------------------------------------- |
| **RAG** (document indexing & search)                     | `ENABLE_RAG: true`                 | `true`              | Embedding model (`RAG.EMBED_MODEL_ID`)    |
| **Episodic Memory** (learn from past tasks)              | `ENABLE_EPISODIC_MEMORY: true`     | `true`              | Embedding model (`RAG.EMBED_MODEL_ID`)    |
| **ACE Playbook** (learn strategies from success/failure) | `ENABLE_PLAYBOOK: true`            | `true`              | None (embeddings optional for refinement) |
| **User Profiling** (personalized responses)              | `PROFILE.USE_PROFILING: true`      | `true`              | Activates after 5+ interactions           |
| **Web Search**                                           | `ENABLE_WEB_SEARCH: true`          | `true`              | `BRAVE_API_KEY` configured                |
| **Web Crawler**                                          | `ENABLE_WEB_CRAWL: true`           | `true`              | None                                      |
| **Vision** (image analysis)                              | Configure `VISION_MODEL_ID`        | Disabled if not set | Vision-capable model                      |
| **Bash Confirmation** (prompt before each shell command) | `REQUIRE_BASH_CONFIRMATION: true`  | `true`              | None (auto-skips when non-interactive)    |
| **Write Confirmation** (prompt before each file write)   | `REQUIRE_WRITE_CONFIRMATION: true` | `true`              | None (auto-skips when non-interactive)    |
| **Verbose Mode** (show thinking process)                 | CLI flag `--no-verbose`            | Enabled             | Supported by model                        |

**Dependency note:** RAG, Episodic Memory, and ACE Playbook refinement all require a working embedding model. If the embedding model is unavailable, the system falls back to SHA256-based deterministic embeddings with degraded semantic search quality. Configure `RAG.EMBED_MODEL_ID` in `config.yaml` to use a real embedding model (see [Embeddings Model](#embeddings-model)).

## 💡 Usage

### Basic Chat

Simply type your questions and press Enter. The assistant will respond using available tools when needed.

```
You: What files are in the current directory?
Assistant: [Uses fs_read tool to list directory contents]

You: Read the README.md file
Assistant: [Uses fs_read tool and displays content]
```

### Commands

| Command            | Description                                                                                                                                                                              |
| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `/exit` or `/quit` | Exit the application                                                                                                                                                                     |
| `/clear`           | Clear conversation history and RAG index                                                                                                                                                 |
| `/save`            | Save current conversation                                                                                                                                                                |
| `/load <path>`     | Load a saved conversation                                                                                                                                                                |
| `/good`            | Mark last response as good (for SFT training)                                                                                                                                            |
| `/compact [focus]` | Summarize older turns to shrink context (optional focus instructions)                                                                                                                    |
| `/config`          | Re-run the interactive configurator (overwrites `config.yaml`, then restarts the app in place to apply)                                                                                  |
| `/model`           | Override just one model — chat (LLM), vision, or embeddings — leaving the rest of `config.yaml` untouched, then restart in place                                                         |
| `/params`          | Tune a model's inference parameters (temperature, top_p, top_k, penalties, reasoning, stop, stream, …) — only the params the chosen provider supports are offered, then restart in place |
| `/mcp`             | List the configured MCP servers (built-in + any from `mcp.json`), their connection status, and tool counts                                                                               |

### Keyboard Shortcuts

- `Ctrl+J`: Insert new line in input
- `Enter`: Submit message
- `Ctrl+C`: Interrupt operation (press twice to exit)

### Verbose Mode

Control thinking process visibility:

```bash
mnemoai              # Verbose mode (shows thinking)
mnemoai --no-verbose # Hide thinking process
# from a checkout: PYTHONPATH=src python -m mnemoai [--no-verbose]
```

### Component Breakdown

#### 1. **Client Layer** (`client/`)

The client manages the conversation flow and user interaction.

- **`client.py`**: Core LangGraph client
  - Initializes MCP connection
  - Manages conversation state
  - Handles model configuration
  - Coordinates managers (profile, conversation)
- **`agent.py`**: LangGraph agent implementation
  - State graph with agent and tools nodes
  - Streaming support with reasoning display
  - Code syntax highlighting
- **`router.py`**: Query classifier and routing
  - Classifies queries into categories (simple_qa, code, research, knowledge, full)
  - Routes each category to a specialized tool subset
  - Configurable classifier prompt via `ROUTING_PROMPT` in config
- **`orchestrator.py`**: Task decomposition and worker orchestration
  - Decomposes complex tasks into ordered subtasks with category assignments
  - Configurable orchestrator and aggregator prompts via config
- **`reasoning_utils.py`**: Shared reasoning/thinking helpers
  - Temporarily disables reasoning for auxiliary LLM calls (routing, task decomposition) so output lands in the response content
  - Extracts visible text from `<think>` tags and Bedrock thinking blocks
- **`mcp_tool_wrapper.py`**: MCP to LangChain adapter
  - Wraps MCP tools as LangChain BaseTool
  - Handles async/sync conversion
- **`ui/`**: User interface components
  - `chat_interface.py`: Interactive chat loop with command handling
  - `spinner.py`: Loading animations
- **`managers/`**: Business logic
  - `agent_conversation_manager.py`: Conversation state and token tracking
  - `user_profile_manager.py`: Automatic user profiling and learning

#### 2. **Server Layer** (`server/`)

MCP server that provides tools to the LLM.

- **`server.py`**: FastMCP server initialization
- **`error_handler.py`**: `@tool_error_handler` decorator (shared by all tools)
- **`tools/`**: Tool implementations
  - `tools_manager.py`: Centralized tool registration and utilities
  - `fs_read.py`: File reading (text, CSV, JSON, PDF, DOCX)
  - `fs_write.py`: File writing (dry-run preview); writes are hard-gated client-side by `REQUIRE_WRITE_CONFIRMATION`
  - `file_edit.py`: Precise string replacement with validation and uniqueness checking
  - `execute_bash.py`: Shell command execution with intelligent error handling
  - `file_search.py`: Fast file/content search (glob patterns + ripgrep)
  - `todo_manager.py`: Todo list management for multi-step tasks
  - `web_search.py`: Brave Search integration
  - `web_crawler.py`: Web page content extraction with RAG integration
  - `describe_image.py`: Vision model image analysis
  - `rag_tool.py`: RAG tools registration
  - **`rag/`**: RAG system
    - `session.py`: Session-scoped RAG management with hybrid search
    - `vector_store_controller.py`: Vector store abstraction layer
    - `faiss_store.py`: FAISS vector store implementation
    - `chroma_store.py`: ChromaDB vector store implementation
  - **`readers/`**: Specialized file readers
    - `line_reader.py`, `directory_reader.py`, `search_reader.py`
    - `csv_reader.py`, `json_reader.py`
    - `pdf_reader.py`, `docx_reader.py`
    - `chunking_helper.py`: Document chunking for RAG

#### 3. **Models Layer** (`models/`)

Model controllers and custom implementations.

- `provider_params.py`: Single source of truth for the config keys each provider consumes (per modality); controllers build their client kwargs from it via `build_kwargs`, and `/model` prunes unsupported keys from it
- `mantle_factory.py`: Bedrock Mantle factory (chat_completions / responses / anthropic protocols), shared by the LLM and vision controllers
- **`controllers/`** (provider-dispatching model initialization):
  - `base_model_controller.py`: Minimal shared base type for the controllers
  - `llm_controller.py`: LLM model initialization (Bedrock, Mantle, Ollama, OpenAI, Anthropic, SageMaker AI, LiteLLM)
  - `vision_model_controller.py`: Vision model initialization
  - `embeddings_controller.py`: Embedding model initialization for RAG
- **`chat_models/`** (concrete LangChain `ChatModel` subclasses):
  - `chat_ollama_wrapper.py`: Extends ChatOllama with `presence_penalty` and `frequency_penalty` support
  - `sagemaker_chat.py`: Full LangChain `BaseChatModel` for SageMaker endpoints (streaming, tool calling, reasoning)

#### 4. **Utils Layer** (`utils/`)

Shared utilities and configuration.

- `config.py`: Configuration loader
- `configurator.py`: First-run interactive setup (when no config resolves) and the `/config` (full reconfigure) and `/model` (override one model section) chat commands
- `paths.py`: Central path helper — single source of truth for the app home (`~/.mnemoai`, override with `$MNEMOAI_HOME`) and all runtime subdirectories (config, plans, tasks, per-profile, per-model)
- `config.yaml.example`: Configuration template (copy to `config.yaml` and add your settings; `.bedrock` and `.bedrock.mantle` variants also provided)
- `bm25.py`: Lightweight BM25 implementation for hybrid (semantic + keyword) search
- `logger.py`: Logging utilities (stderr output)
- **`formatting/`**: Text formatting
  - `code_formatter.py`: Code syntax highlighting
  - `url_formatter.py`: URL highlighting
  - `response_parser.py`: Response processing

### Data Flow

1. **User Input** → `ChatInterface` → `LangGraphClient`
2. **Client** → Invokes LangGraph agent with MCP tools
3. **Classifier** → Routes query to a category (simple*qa, code, research, knowledge, full) (\_if routing enabled*)
4. **Orchestrator** → For `full` tasks: decomposes into subtasks, spawns workers, aggregates results (_if orchestration enabled_)
5. **LangGraph** → Executes agent node with route-specific tools, decides to use tools
6. **MCP Server** → Executes tool (e.g., fs_read, web_search, RAG)
7. **Tool Result** → Returned to agent via tools node
8. **LangGraph** → Continues agent loop until response complete
9. **Response** → Displayed to user via `ChatInterface`

### Session Management

Each chat session has a unique ID used for:

- RAG document indexing (session-scoped)
- Chunk caching for file summarization
- Training data collection (SFT markers)

Session data is stored in `~/.mnemoai/{profile_name}/`:

```
~/.mnemoai/
└── {profile_name}/
    ├── conversations/           # Saved conversations
    ├── profiles/                # User profiles
    ├── todos/                   # Todo list data
    ├── rag_session_id.txt       # Current RAG session
    ├── rag_store_*.faiss        # FAISS vector index (or ChromaDB directory)
    ├── chunk_cache_*.db         # SQLite chunk cache
    └── models/                  # Per-model memory (isolated by chat model)
        └── {sanitized_model}/   # e.g. global.anthropic.claude-fable-5
            ├── episodic_memory/ # Episodic memory store (FAISS or ChromaDB)
            └── playbook/        # ACE playbook strategies and metrics
```

> **Model-scoped memory:** episodic memory and the playbook live under `models/{model}/` so trying a different chat model doesn't contaminate the memory/strategies learned with another. Conversations, todos, RAG, and the user profile remain shared across models.

#### Context Compaction

To keep long conversations within the model's context window, the assistant compacts history by summarizing it:

- **Automatic** — after a turn pushes the conversation past `MAX_CONVERSATION_TOKENS`, older messages are summarized into the system prompt while the most recent `LLM.KEEP_RECENT_MESSAGES` turns are kept verbatim.
- **Manual** — run `/compact` any time (optionally `/compact <focus instructions>` to steer what the summary emphasizes). Manual compaction keeps a smaller recent window (`LLM.MANUAL_COMPACT_KEEP_RECENT`).

The kept-verbatim window is bounded by **both** a message count and a token budget (`LLM.KEEP_RECENT_TOKEN_BUDGET`, default 25% of `MAX_CONVERSATION_TOKENS`). Walking newest→oldest, a message that would exceed the budget is summarized instead of kept — so a single oversized recent message (e.g. a pasted document that alone fills the context window) cannot survive compaction verbatim.

The summary preserves topics, decisions, and **tool calls/results** (which tools ran, their inputs, and outcomes), so the agent retains actionable context after compacting.

## 🚀 Productivity Tools

The assistant includes specialized tools for efficient code and file manipulation:

### 📋 Todo List Management

Track multi-step tasks with automatic status management:

**Tools:**

- `todo_write(todos)`: Update the todo list
- `todo_read()`: View current todos
- `todo_clear()`: Clear all todos

**Features:**

- Three states: `pending`, `in_progress`, `completed`
- Enforces exactly ONE task in progress at a time
- Real-time progress tracking
- Stored in `~/.mnemoai/{profile}/todos/current_todos.json`

**Usage Example:**

```
You: Implement user authentication
Assistant: [Creates todos for: database setup, API endpoints, frontend integration, testing]
Assistant: [Marks first todo as in_progress]
Assistant: [Completes each step, updating todos in real-time]
```

### 🔎 Fast Search Tools

High-performance file and content searching:

#### Glob Search (File Names)

Find files by name patterns:

```python
glob_search(pattern="**/*.py")  # All Python files recursively
glob_search(pattern="src/**/*.ts", max_results=100)  # TypeScript in src/
glob_search(pattern="test_*.py", sort_by_mtime=False)  # Unsorted for speed
```

**Parameters:**

- `pattern`: Glob pattern (e.g., `**/*.py`, `*.{yaml,json}`)
- `path`: Directory to search (default: current directory)
- `max_results`: Limit results (default: 1000, use 0 for unlimited)
- `sort_by_mtime`: Sort by modification time (default: True)

**Performance:** Best for project/codebase searches. For system-wide searches (entire home directory), the assistant automatically uses `find` command instead.

#### Grep Search (File Content)

Search within file contents using ripgrep:

```python
grep_search(pattern="class Foo")  # Find class definitions
grep_search(pattern="TODO|FIXME", file_pattern="*.py", case_insensitive=True)
grep_search(pattern="import React", output_mode="content")  # Show matched lines
```

**Parameters:**

- `pattern`: Regex pattern to search for
- `path`: Directory to search (default: current directory)
- `file_pattern`: Filter by file type (e.g., `*.py`, `*.{ts,tsx}`)
- `case_insensitive`: Case-insensitive search (default: False)
- `output_mode`: `files_with_matches` (default), `content`, or `count`
- `context_lines`: Lines of context around matches
- `max_results`: Maximum matches per file (default: 100)

**Requirements:** Requires `ripgrep` installed (see Installation section)

**Performance:** 10-100x faster than traditional grep for large codebases.

### ✏️ Precise File Editing

Safe string replacement with validation:

```python
file_edit(
    file_path="/path/to/file.py",
    old_string="def old_function():\n    pass",
    new_string="def new_function():\n    return True",
    replace_all=False  # Requires uniqueness (default)
)
```

**Safety Features:**

- Validates file exists before editing
- Checks that `old_string` exists in file
- Enforces uniqueness (prevents accidental multiple replacements)
- Provides detailed error messages with troubleshooting steps
- Returns line count changes

**Best Practice Workflow:**

1. Read the file first with `fs_read`
2. Copy the EXACT text you want to replace (including whitespace)
3. Create the new version with your changes
4. Call `file_edit` with exact strings

**Error Handling:** If the string isn't unique, the tool provides the line numbers where it appears so you can add more context.

### 🛡️ Enhanced Error Handling

All tools now provide intelligent error messages with troubleshooting guidance:

**Example Error Response:**

```json
{
  "error": true,
  "error_type": "FileNotFoundError",
  "message": "File or directory not found: /path/to/file.txt",
  "next_steps": [
    "Verify the file path is correct",
    "Use glob_search to find files by pattern",
    "Check with execute_bash('ls -la /parent/dir')",
    "Ensure you have read permissions"
  ],
  "original_error": "..."
}
```

**Handled Error Types:**

- FileNotFoundError
- PermissionError
- IsADirectoryError
- JSONDecodeError
- Encoding errors
- Command execution errors
- Timeout errors

### 🔐 Action Confirmation (bash & file writes)

Destructive tools ask for explicit confirmation before they run (Claude Code-style) — shell commands (`execute_bash`) and file modifications (`fs_write`, `file_edit`):

```
▶ Run shell command?
  rm -rf build/
  Proceed? (y/N):

▶ Write to file?
  create ~/script.py
  Proceed? (y/N):
```

Only an explicit `y`/`yes` proceeds; anything else (including Enter) declines, and the model is told the user declined. This is a **hard gate enforced client-side** — the prompt always fires regardless of what the model does, because the client owns the terminal (the MCP server is a piped subprocess and can't prompt). For `fs_write` only the actual write is gated, not its `dry_run` preview.

- Toggles: `REQUIRE_BASH_CONFIRMATION` and `REQUIRE_WRITE_CONFIRMATION` (both default `true`). Set either to `false` for trusted/automation setups.
- Non-interactive runs (no TTY — tests, pipes, CI) auto-proceed so they don't hang.

### 🛡️ Git Safety

Safe git operations with protection against common mistakes:

**Tools:**

- `git_safe(command="...")` - Execute git commands with safety checks
- `git_status_safe()` - Comprehensive status with warnings
- `git_commit_safe(message="...", add_all=True)` - Safe commits with staging

**Protected Operations:**

| Operation                  | Protection                         |
| -------------------------- | ---------------------------------- |
| Force push to main/master  | Blocked                            |
| `git reset --hard`         | Warning + confirmation required    |
| `git push --force`         | Warning (use `--force-with-lease`) |
| `git commit --amend`       | Checks if already pushed           |
| Skip hooks (`--no-verify`) | Warning                            |
| Force delete branch (`-D`) | Warning                            |

**Example:**

```python
# Safe - uses git_safe with protections
git_safe(command="push origin feature-branch")

# Dangerous - requires confirmation
git_safe(command="reset --hard HEAD~1", allow_dangerous=True, reason="Discarding failed experiment")
```

### 📝 Plan Mode

Implementation planning workflow for complex tasks:

**Workflow:**

1. `enter_plan_mode(task_description="Add user authentication")`
2. Explore codebase with search tools
3. `add_plan_step(step_number=1, title="Create user model", description="...")`
4. `add_plan_file(file_path="models/user.py", action="create")`
5. `add_plan_risk(risk="Migration needed", mitigation="Add migration script")`
6. `present_plan()` - Show user for approval
7. `approve_plan()` + `exit_plan_mode()` - Start implementing

**When to Use:**

- New feature with multiple files
- Architectural decisions needed
- Multi-step refactoring
- Unclear requirements

**Plan Storage:** `~/.mnemoai/plans/current_plan.json`
**Task Output:** `~/.mnemoai/tasks/`

### 🔄 Background Tasks

Run long operations in parallel without blocking:

**Tools:**

- `start_background_task(command="...", description="...")` - Start task
- `get_task_status(task_id="...")` - Check progress
- `get_task_output(task_id="...")` - Get output
- `list_background_tasks()` - See all tasks
- `cancel_background_task(task_id="...")` - Stop task
- `wait_for_task(task_id="...", timeout_seconds=300)` - Wait for completion

**When to Use:**

- Running full test suites
- Building large projects
- Installing dependencies
- Running linters on entire codebase
- Any command > 30 seconds

**Example:**

```python
# Start tests in background
result = start_background_task(command="pytest", description="Running tests")
# Returns: {"task_id": "abc123", ...}

# Check status later
get_task_status(task_id="abc123")

# Get output when done
get_task_output(task_id="abc123", tail_lines=50)
```

**Task Storage:** Output logs saved to `~/.mnemoai/tasks/`

## 🔧 Configuration

### Model Configuration

The assistant supports multiple model types:

#### Amazon Bedrock

```yaml
MODEL_ID:
  NAME: us.amazon.nova-pro-v1:0
  TYPE: bedrock
  REGION: us-east-1
  TEMPERATURE: 0.1
```

> **Note:** Newer Claude models on Bedrock reject `temperature` as deprecated. Omit `TEMPERATURE` for those — it is only sent when explicitly configured.

> **Using a named AWS profile (Bedrock, SageMaker, Mantle).** These providers use the standard boto3 credential chain (default profile / env vars / instance role). To select a specific named profile instead, set `AWS_PROFILE` via the config `ENV:` section — values there are exported as environment variables at startup, and boto3 picks them up automatically. No model-level config key is needed:
>
> ```yaml
> ENV:
>   AWS_PROFILE: my-bedrock-profile
>   # AWS_REGION: us-east-1   # any AWS env var works here too
> ```

> **Using a Bedrock API key (instead of AWS credentials).** Bedrock supports short-term API keys (a `bedrock-api-key-...` value from the console). For **standard Bedrock** (`TYPE: bedrock`), set it as `AWS_BEARER_TOKEN_BEDROCK` — `langchain-aws` reads it automatically, no model config needed:
>
> ```yaml
> ENV:
>   AWS_BEARER_TOKEN_BEDROCK: bedrock-api-key-XXXXXXXX
> ```
>
> (For **Mantle**, the same key is supplied differently — see the Mantle section below.)

#### Amazon Bedrock Mantle

Bedrock Mantle is an **OpenAI-compatible** API (not the Bedrock Converse API). By default it authenticates with a short-lived bearer token minted from your standard AWS credentials via [`aws-bedrock-token-generator`](https://pypi.org/project/aws-bedrock-token-generator/), so your normal `aws configure` / SSO setup works — no extra keys to manage. Use `TYPE: mantle` and a bare model ID from the Mantle catalog.

```yaml
MODEL_ID:
  NAME: qwen.qwen3-32b # bare Mantle model id (e.g. anthropic.claude-opus-4-8)
  TYPE: mantle
  REGION: us-east-1
  MAX_TOKENS: 8192
```

**Authenticating with a Bedrock API key (no AWS credentials).** Instead of minting a token, you can supply a short-term Bedrock API key directly. Mantle reads it from the `BEDROCK_API_KEY` environment variable (set it via the config `ENV:` section), or from a per-model `API_KEY` field. When a key is present it's used as-is; otherwise the app falls back to minting from AWS credentials. (Note: standard Bedrock uses `AWS_BEARER_TOKEN_BEDROCK` for the same key — Mantle uses `BEDROCK_API_KEY`.)

```yaml
# Option A — environment variable (applies to all Mantle calls)
ENV:
  BEDROCK_API_KEY: bedrock-api-key-XXXXXXXX

# Option B — per-model key
MODEL_ID:
  NAME: qwen.qwen3-32b
  TYPE: mantle
  REGION: us-east-1
  API_KEY: bedrock-api-key-XXXXXXXX
```

**API protocols.** Mantle serves models under three protocols. Select with `API_PROTOCOL` (works for both chat and vision):

- `chat_completions` (default) — base `/v1`, OpenAI Chat Completions API. Most models (Qwen, Gemma, GPT-OSS, DeepSeek, …).
- `responses` — base `/openai/v1`, OpenAI Responses API. Required by models that only expose Responses, such as `openai.gpt-5.4`.
- `anthropic` — base `/anthropic`, Anthropic Messages API. For Claude models (e.g. `anthropic.claude-haiku-4-5`).

```yaml
# OpenAI Responses model (e.g. GPT-5.4)
MODEL_ID:
  NAME: openai.gpt-5.4
  TYPE: mantle
  REGION: us-west-2 # gpt-5.4 is in us-west-2, not us-east-1
  API_PROTOCOL: responses
  MAX_TOKENS: 8192

# Anthropic Claude model
MODEL_ID:
  NAME: anthropic.claude-haiku-4-5
  TYPE: mantle
  REGION: us-east-1
  API_PROTOCOL: anthropic
  MAX_TOKENS: 8192
```

- `ENDPOINT_URL` is optional; it defaults to `https://bedrock-mantle.<REGION>.api.aws/{v1 | openai/v1 | anthropic}` depending on the protocol.
- The Mantle catalog (Qwen, Mistral, DeepSeek, GLM, Gemma, Claude, GPT-5.4, …) differs from standard Bedrock and varies by account/region.
- `TYPE: mantle` works for both `MODEL_ID` (chat) and `VISION_MODEL_ID` (image description) — vision-capable models like `qwen.qwen3-vl-235b-a22b-instruct` are supported.
- **Caveats:** Pick the right `API_PROTOCOL` per model (using the wrong one returns a 400 "does not support the '/v1/…' API" error). `anthropic` requires the `langchain-anthropic` package (in `requirements.txt`). Models like `anthropic.claude-fable-5` also require the account's data-retention mode to be `provider_data_share`, otherwise they report `unavailable`.

> For **standard** Bedrock (Converse API), `ENDPOINT_URL` is also accepted on `MODEL_ID`/`VISION_MODEL_ID` with `TYPE: bedrock` to override the default endpoint.

#### Ollama (Local)

```yaml
MODEL_ID:
  NAME: qwen3-4b-thinking-2507-q6-k:latest
  TYPE: ollama
  HOST: localhost
  PORT: 11434
  REPETITION_PENALTY: 1.1
  PRESENCE_PENALTY: 1.5
  TEMPERATURE: 0.1
  TOP_P: 0.95
```

#### OpenAI

```yaml
MODEL_ID:
  NAME: gpt-5-mini-2025-08-07
  TYPE: openai
  STREAM: true
  REASONING_EFFORT: medium
# Requires OPENAI_API_KEY environment variable
```

#### Anthropic (Claude API)

The direct Anthropic API (`api.anthropic.com`) via `langchain-anthropic`. This is **distinct from the Bedrock Mantle `anthropic` protocol** (which reaches Claude through Bedrock) — `TYPE: anthropic` talks to Anthropic directly. `STOP` maps to Anthropic's `stop_sequences`, and extended thinking is enabled with `REASONING` (+ optional `REASONING_EFFORT` / `THINKING_TOKENS`).

```yaml
MODEL_ID:
  NAME: claude-opus-4-8
  TYPE: anthropic
  MAX_TOKENS: 4096
  TEMPERATURE: 0.4
  # REASONING: true          # enable extended thinking
  # REASONING_EFFORT: high   # low | medium | high | max
  # ENDPOINT_URL: https://...  # optional custom base URL
# Requires ANTHROPIC_API_KEY env var, or set MODEL_ID.API_KEY
```

#### Amazon SageMaker AI

```yaml
MODEL_ID:
  NAME: your-endpoint-name
  TYPE: sagemaker
  REGION: us-east-1
  REPETITION_PENALTY: 1.1
  PRESENCE_PENALTY: 1.5
  TEMPERATURE: 0.1
  MAX_TOKENS: 4096
```

#### LiteLLM (100+ Providers)

```yaml
MODEL_ID:
  NAME: openai/your-model-name
  TYPE: litellm
  API_BASE: http://localhost:8000/v1
  API_KEY: your-api-key
  TEMPERATURE: 0.1
  MAX_TOKENS: 4096
```

### Vision Model Configuration

For Bedrock:

```yaml
VISION_MODEL_ID:
  NAME: global.anthropic.claude-haiku-4-5-20251001-v1:0
  TYPE: bedrock
  REGION: us-east-1
  TEMPERATURE: 0.3
```

For Ollama:

```yaml
VISION_MODEL_ID:
  NAME: qwen3-vl:2b
  TYPE: ollama
  HOST: localhost
  PORT: 11434
  TEMPERATURE: 0.3
```

For OpenAI:

```yaml
VISION_MODEL_ID:
  NAME: gpt-5-mini-2025-08-07
  TYPE: openai
  STREAM: true
  REASONING_EFFORT: medium
```

For Anthropic (Claude is multimodal):

```yaml
VISION_MODEL_ID:
  NAME: claude-opus-4-8
  TYPE: anthropic
  MAX_TOKENS: 1500
  TEMPERATURE: 0.3
# Requires ANTHROPIC_API_KEY env var, or set VISION_MODEL_ID.API_KEY
```

For SageMaker AI (endpoint must serve a vision-capable model accepting the OpenAI image format):

```yaml
VISION_MODEL_ID:
  NAME: your-endpoint-name
  TYPE: sagemaker
  REGION: us-east-1
  INPUT_FORMAT: openai_chat
  TEMPERATURE: 0.3
```

For LiteLLM (any of its vision-capable models):

```yaml
VISION_MODEL_ID:
  NAME: openai/gpt-4o # provider-prefixed model id
  TYPE: litellm
  API_BASE: http://localhost:4000 # optional (proxy / self-hosted)
  API_KEY: your-api-key # optional (else the provider's env var)
```

### Model Parameters

This is the full reference for what you can put under `MODEL_ID`,
`VISION_MODEL_ID`, and `RAG.EMBED_MODEL_ID`. Only `NAME` and `TYPE` are
required; everything else is optional and omitted keys fall back to the
provider/model default. The interactive configurator (`/config`, `/model`)
sets the common ones — use this reference to hand-tune `config.yaml` for
anything else a provider or model supports.

#### Identity, connection & auth

| Parameter      | Applies to `TYPE`                | Description                                                                                                                                      |
| -------------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| `NAME`         | all (**required**)               | Model id / Ollama model / Bedrock model id / Mantle bare id / SageMaker endpoint name                                                            |
| `TYPE`         | all (**required**)               | `ollama`, `bedrock`, `mantle`, `openai`, `anthropic`, `sagemaker`, `litellm` (embeddings: `ollama`, `bedrock`, `openai`, `sagemaker`, `litellm`) |
| `HOST`         | `ollama`                         | Ollama host (default `localhost`)                                                                                                                |
| `PORT`         | `ollama`                         | Ollama port (default `11434`)                                                                                                                    |
| `REGION`       | `bedrock`, `mantle`, `sagemaker` | AWS region (default `us-east-1`)                                                                                                                 |
| `API_PROTOCOL` | `mantle`                         | `chat_completions` (default), `responses`, or `anthropic`                                                                                        |
| `ENDPOINT_URL` | `bedrock`, `mantle`, `anthropic` | Override the default endpoint URL (Anthropic: custom base URL)                                                                                   |
| `API_KEY`      | `mantle`, `anthropic`, `litellm` | Mantle: Bedrock API key (else `BEDROCK_API_KEY` env / minted token). Anthropic: else `ANTHROPIC_API_KEY` env. LiteLLM: provider key              |
| `API_BASE`     | `litellm`                        | LiteLLM API base URL                                                                                                                             |
| `INPUT_FORMAT` | `sagemaker`                      | `openai_chat` (default) or `huggingface`                                                                                                         |

> Standard Bedrock also reads the `AWS_BEARER_TOKEN_BEDROCK` env var, and all AWS
> providers honor `AWS_PROFILE` — see the API-key/profile notes under Amazon Bedrock.

#### Inference parameters

Optional generation settings. The **Honored by** column lists the providers that
actually send each one (others ignore it). These apply to `MODEL_ID` and
`VISION_MODEL_ID`; **`EMBED_MODEL_ID` takes none of them** (embeddings only use
`NAME`/`TYPE` + connection).

This table is derived from `models/provider_params.py` — the single source of
truth that the controllers build their client kwargs from — so it reflects
exactly what each provider's init path forwards. (`mantle` reads
`TEMPERATURE`/`MAX_TOKENS`/`TOP_P` via the Mantle factory.)

| Parameter            | Description                            | Honored by (`MODEL_ID`)                                        |
| -------------------- | -------------------------------------- | -------------------------------------------------------------- |
| `MAX_TOKENS`         | Max output tokens to generate          | ollama, bedrock, mantle, openai, anthropic, sagemaker, litellm |
| `TEMPERATURE`        | Sampling temperature                   | ollama, bedrock, mantle, openai, anthropic, sagemaker, litellm |
| `TOP_P`              | Top-p (nucleus) sampling               | ollama, bedrock, mantle, openai, anthropic, sagemaker, litellm |
| `TOP_K`              | Top-k sampling                         | ollama, anthropic, sagemaker                                   |
| `STOP`               | Stop sequences (YAML list)             | ollama, bedrock, anthropic, sagemaker, litellm                 |
| `STREAM`             | Stream tokens (default `true`)         | mantle, openai, anthropic, litellm                             |
| `PRESENCE_PENALTY`   | Presence penalty                       | ollama, openai                                                 |
| `FREQUENCY_PENALTY`  | Frequency penalty                      | ollama                                                         |
| `REPETITION_PENALTY` | Repetition penalty                     | ollama, litellm                                                |
| `REASONING`          | Enable extended thinking (boolean)     | bedrock, anthropic                                             |
| `THINKING_TOKENS`    | Thinking token budget (default `2048`) | bedrock, anthropic                                             |
| `REASONING_EFFORT`   | `low`/`medium`/`high`/`max`            | openai, anthropic (also maps to Bedrock thinking budget)       |

`VISION_MODEL_ID` supports the same seven providers as `MODEL_ID`. It accepts a
subset of params: `MAX_TOKENS`/`TEMPERATURE`/`TOP_P` across providers, plus
`TOP_K` on ollama/anthropic/sagemaker and `STOP` on ollama/sagemaker. Connection
keys follow the provider (host/port, region, Mantle protocol, SageMaker
`INPUT_FORMAT`, LiteLLM/Anthropic `API_BASE`/`API_KEY`/base URL).

> **Provider-appropriate tuning matters.** Newer Claude and GPT models reject
> `TEMPERATURE` outright; `STOP`, penalties, and `TOP_K` are largely
> Ollama/SageMaker concepts. When `/model` switches a section's provider it
> drops the keys the new provider doesn't consume for you, but for everything
> else edit `config.yaml` to match what your specific provider/model accepts.

The context window is set separately, at the top level (it's not part of a model
section): `MAX_CONVERSATION_TOKENS` (see General Parameters below).

### General Parameters

```yaml
# Context window size (passed to model as num_ctx for Ollama)
MAX_CONVERSATION_TOKENS: 65536

# Maximum tokens when reading documents (CSV, JSON, text files)
DOC_MAX_TOKENS: 16384

# Profile configuration
PROFILE:
  NAME: default # Used for session data isolation (~/.mnemoai/{NAME}/)
  USE_PROFILING: true # Enable automatic user profiling
```

### Embeddings Configuration

Embeddings settings are nested under the `RAG` section:

```yaml
RAG:
  EMBEDDINGS:
    CACHE_ENABLED: true # LRU cache for embedding vectors (avoids re-embedding same text)
    CACHE_SIZE: 1000 # Maximum cached embeddings
    FALLBACK_ENABLED: true # Fall back to SHA256 if embedding model unavailable
    FALLBACK_TYPE: "sha256" # Fallback type (sha256, random, zeros)
```

### LLM Interaction Configuration

```yaml
LLM:
  ENABLE_THINKING: true # Enable thinking tags (verbose mode)
  RETRY_ENABLED: true # Retry failed LLM calls
  MAX_RETRIES: 3 # Maximum retry attempts
  RETRY_DELAY: 1.0 # Seconds between retries
  RETRY_BACKOFF: 2.0 # Exponential backoff multiplier
  SUMMARIZATION_THINK: false # Include thinking in summarization
  TOKEN_COUNTING:
    OLLAMA_APPROXIMATION: 1.3 # Chars-to-tokens multiplier for Ollama
    FALLBACK_MODEL: "gpt-4" # Tiktoken model for fallback counting
```

### System Prompt

The system prompt in `config.yaml` defines the assistant's behavior. Customize the `SYSTEM_PROMPT` field to change the assistant's personality, instructions, and tool usage patterns. Key sections in the default prompt:

- `<identity>`: Basic identity and core principles
- `<reasoning_discipline>`: Thinking rules and loop detection
- `<output_format>`: Response formatting requirements
- `<information_sources>`: RAG vs web vs internal knowledge decision tree
- `<file_operations>`: Read/write/edit workflow rules
- `<search_tools>`: Glob and grep usage guidance
- `<git_operations>`: Git safety rules
- `<task_management>`: Todo, plan mode, and background task rules
- `<error_handling>`: Error response guidelines
- `<communication>`: Style and security rules

### RAG Configuration

```yaml
ENABLE_RAG: true # Master toggle for RAG system
RAG:
  MAX_TOKENS: 8192 # Threshold: documents above this are ingested into RAG
  CHUNK_TOKENS: 1024 # Chunk size in tokens (recommended: 512-2048)
  SEARCH:
    SEMANTIC_WEIGHT: 0.5 # Semantic similarity weight (0-1)
    KEYWORD_WEIGHT: 0.5 # BM25 keyword weight (0-1)
  VECTOR_STORE:
    TYPE: chromadb # Vector store backend: "faiss" or "chromadb"
  EMBEDDINGS:
    CACHE_ENABLED: true
    CACHE_SIZE: 1000
    FALLBACK_ENABLED: true
    FALLBACK_TYPE: "sha256"
```

**Requires:** An embedding model configured via `RAG.EMBED_MODEL_ID` (see [Embeddings Model](#embeddings-model)).

### Episodic Memory Configuration

```yaml
ENABLE_EPISODIC_MEMORY: true
EPISODIC_MEMORY:
  STORE_TYPE: chromadb # or faiss
  # Similarity Thresholds
  DUPLICATE_THRESHOLD: 0.95 # Higher = stricter duplicate detection
  RETRIEVAL_THRESHOLD: 0.7 # Minimum similarity to retrieve episodes
  FOLLOW_UP_THRESHOLD: 0.4 # Similarity to detect follow-up questions (skips injection)
  REDUNDANCY_THRESHOLD: 0.5 # Filter episodes redundant with conversation
  # Hybrid Search Weights
  SEMANTIC_WEIGHT: 0.7 # Semantic similarity weight (0-1)
  KEYWORD_WEIGHT: 0.3 # Keyword matching weight (0-1)
  # Token and Size Limits
  MAX_TOKENS_PER_EPISODE: 400 # Max tokens for episode text
  MAX_EPISODES: 1000 # Maximum stored episodes
  MAX_AGE_DAYS: 90 # Maximum episode age in days
  # Success Detection
  SUCCESS_MARKERS: # Phrases that indicate task success
    - thanks
    - perfect
    - great
    - worked
  CORRECTION_MARKERS: # Phrases that indicate errors
    - wrong
    - error
    - fix
    - actually
  # Storage Behavior
  IMMEDIATE_STORAGE: true # Store episodes immediately
  MIN_TOOLS_OR_LENGTH: 300 # Min response length if no tools used
  # Query Enhancement
  ENABLE_QUERY_EXPANSION: true # Expand queries with synonyms
  QUERY_EXPANSION_TERMS: 3 # Max terms to add per query
```

**Requires:** An embedding model configured via `RAG.EMBED_MODEL_ID` (see [Embeddings Model](#embeddings-model)).

**How it works:**

- Automatically stores successful task completions with full conversation context
- Uses hybrid search (70% semantic + 30% BM25) to find similar past tasks
- **Conversation-aware injection**: Only injects episodic memory when relevant
  - Detects follow-up questions and skips injection (uses conversation context instead)
  - Filters out episodes redundant with current conversation
  - Uses semantic similarity (with embeddings) or Jaccard similarity (fallback)
- Injects compact context showing: task → tools used → outcome
- Automatic cleanup: keeps max 1000 episodes, removes entries older than 90 days

**Success detection:**

- User feedback: "thanks", "perfect", "great"
- No error markers in response
- All tools executed successfully
- Filters out simple greetings and short responses

#### Embeddings Model

All embedding configuration is nested under `RAG:`:

For Bedrock:

```yaml
RAG:
  EMBED_MODEL_ID:
    NAME: amazon.titan-embed-text-v2:0
    TYPE: bedrock
    REGION: us-east-1
```

For Ollama:

```yaml
RAG:
  EMBED_MODEL_ID:
    NAME: mxbai-embed-large
    TYPE: ollama
    HOST: localhost
    PORT: 11434
```

For OpenAI:

```yaml
RAG:
  EMBED_MODEL_ID:
    NAME: text-embedding-ada-002
    TYPE: openai
```

For SageMaker:

```yaml
RAG:
  EMBED_MODEL_ID:
    NAME: your-endpoint-name
    TYPE: sagemaker
    REGION: us-east-1
```

For LiteLLM (any of its 100+ providers via one OpenAI-style API):

```yaml
RAG:
  EMBED_MODEL_ID:
    NAME: openai/text-embedding-3-small # provider-prefixed model id
    TYPE: litellm
    API_BASE: http://localhost:4000 # optional (proxy / self-hosted)
    API_KEY: your-api-key # optional (else the provider's env var)
```

**Vector Store Options:**

- **ChromaDB** (default): Persistent vector database with built-in metadata support
- **FAISS**: Fast, in-memory vector search with disk persistence

Switch between stores by changing `RAG.VECTOR_STORE.TYPE` in config. The system uses a controller pattern, so all RAG functionality works identically regardless of the store.

## 📚 Advanced Features

### Query Routing

When enabled, the assistant classifies each query before processing it and routes it to a specialized tool subset. This reduces noise for the model and improves response quality.

**Categories:**

| Route       | Description                                 | Tools Available                                      |
| ----------- | ------------------------------------------- | ---------------------------------------------------- |
| `simple_qa` | Greetings, explanations, general knowledge  | None (direct LLM answer)                             |
| `code`      | File ops, code editing, git, shell commands | fs_read, fs_write, file_edit, bash, git, search, etc |
| `research`  | Web search, URL fetching                    | web_search, web_crawler                              |
| `knowledge` | Document reading, indexing, RAG queries     | pdf/csv/docx/json readers, RAG tools, fs_read        |
| `full`      | Multi-category or ambiguous tasks           | All tools (fallback)                                 |

**How it works:**

1. A lightweight LLM call classifies the query into one of the categories above
2. The agent node binds only the tools for that category
3. If a query spans multiple categories, it routes to `full` (all tools)
4. The classifier prompt is customizable via `ROUTING_PROMPT` in `config.yaml`

**Configuration:**

```yaml
ENABLE_ROUTING: true
ROUTING_PROMPT: |
  # Custom classifier prompt (optional, has a sensible default)
  ...
```

### Orchestrator-Workers

When enabled alongside routing, tasks classified as `full` (spanning multiple categories) are automatically decomposed into focused subtasks executed by specialized workers.

**How it works:**

1. **Orchestrator**: An LLM call decomposes the complex query into ordered subtasks, each assigned a category (code, research, knowledge, etc.)
2. **Workers**: Each subtask is executed by a worker agent with only the tools for its category. Workers run sequentially — each receives context from previously completed subtasks.
3. **Aggregator**: If there were multiple subtasks, a final LLM call synthesizes all worker results into a single coherent response.

**Example flow for "Read this PDF and write a summary to a file":**

```
Orchestrator decomposes into:
  [Step 1/2: Read and summarize the PDF document]        → knowledge worker
  [Step 2/2: Write the summary to summary.md]            → code worker
  [Synthesizing results...]                               → aggregator
```

**Configuration:**

```yaml
ENABLE_ROUTING: true # Required
ENABLE_ORCHESTRATION: true # Activates orchestrator for 'full' route
# ORCHESTRATOR_PROMPT: |      # Optional: customize decomposition prompt
# AGGREGATOR_PROMPT: |        # Optional: customize synthesis prompt
```

**When orchestration is disabled**, `full` routes use all tools in a single agent loop (the previous behavior). No regression.

### Web Search Configuration

This tool uses the Brave Search API. Obtain an API key from [Brave Search Developer Portal](https://brave.com/search/api/).

```yaml
BRAVE_API_KEY: your-api-key-here # For web search
```

### Web Crawler Configuration

Enable web page content extraction with automatic RAG integration:

```yaml
ENABLE_WEB_CRAWL: true
```

When enabled, the `web_crawler` tool:

- Extracts content from web pages as markdown
- Automatically ingests large pages (>8K tokens) into RAG (if enabled)
- Uses the same chunking configuration as PDF/DOCX readers

> **Browser dependency.** Crawling uses a headless Chromium via Playwright,
> whose browser binary is a separate ~260MB download not pulled in by
> `pip` / `uv tool install`. The tool installs it automatically on the first
> crawl after a fresh install/upgrade. If that auto-install fails (e.g.
> offline), run it manually in the same environment:
> `python -m playwright install chromium` (for an installed CLI:
> `~/.local/share/uv/tools/mnemoai/bin/python -m playwright install chromium`).

### External MCP Servers

mnemoai always runs its own built-in MCP server (file ops, bash, git, web, RAG,
vision, planning). You can add **more** MCP servers by creating
`~/.mnemoai/mcp/mcp.json` with the standard `mcpServers` schema (an
`mcp.json.example` is seeded there on first run). Their tools are merged with the
built-in ones and made available to the agent.

```json
{
  "mcpServers": {
    "brave-search": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-brave-search"],
      "env": { "BRAVE_API_KEY": "your_brave_api_key" }
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/dir"],
      "disabled": true
    }
  }
}
```

Per-server fields: `command` (required), `args` (optional list), `env`
(optional; merged over the process environment), and `disabled` (optional;
`true` skips the server). A template ships at
`~/.mnemoai/mcp/mcp.json.example` (seeded on first run from the bundled
`src/mnemoai/utils/mcp.json.example`).

Behavior:

- **Additive** — the built-in server is always on; external servers run
  alongside it. Tools from all servers are merged into one list.
- **Resilient** — if an external server fails to start (bad command, missing
  binary, crash), it's logged in red and skipped; the app still runs with the
  built-in server and any others that connected.
- **No shadowing** — if an external tool's name collides with a built-in one,
  the external tool is exposed as `servername__tool` so core tools are never
  overridden (the server is still called with the original tool name).
- **Works with routing & orchestration** — external tools are appended to every
  non-empty query route, and when orchestration is enabled the task decomposer
  is told which external tools exist and steers subtasks that need them to the
  `full` category (which binds every tool). So external tools stay reachable
  whether routing/orchestration is on or off.
- Run **`/mcp`** in the chat to see configured servers, status, and tool counts.

### RAG (Retrieval-Augmented Generation)

The RAG system automatically indexes documents for semantic search with **hybrid search** (semantic embeddings + BM25 keyword scoring).

**How it works:**

1. Read a PDF/DOCX file → Automatically chunked and indexed
2. Ask questions → Assistant searches indexed documents first using hybrid search
3. Session-scoped → Cleared on `/clear` or exit

**RAG Tools:**

- `list_documents()`: Show indexed documents
- `search_in_documents(query, top_k)`: Hybrid semantic + BM25 search
- `clear_documents()`: Clear RAG index

**Configuration:**

- `RAG.CHUNK_TOKENS`: Chunk size (recommended: 512-2048)
- `RAG.VECTOR_STORE.TYPE`: Choose between `faiss` or `chromadb`
- `RAG.SEARCH.SEMANTIC_WEIGHT` / `RAG.SEARCH.KEYWORD_WEIGHT`: Configurable hybrid weights
- Recursive chunking with 10% overlap
- Hybrid search: BM25 (Okapi BM25 with TF-IDF, term saturation, length normalization) + semantic similarity
- Independent candidate retrieval from both BM25 and embeddings, merged and re-ranked

**Vector Store Options:**

- **ChromaDB**: Persistent vector database with metadata support (default)
- **FAISS**: Fast in-memory search with disk persistence

The system uses a **VectorStoreController** for easy switching between stores. All functionality (indexing, searching, clearing) works identically regardless of the chosen store.

### User Profile Learning

After 5+ interactions, the assistant builds a profile:

- **Cognitive style**: Analytical, creative, pragmatic, systematic
- **Domain expertise**: Python, AWS, DevOps, ML, etc.
- **Learning style**: Visual, hands-on, theoretical
- **Communication patterns**: Tone, complexity, question styles
- **Code preferences**: Testing, documentation, type hints

Profile is automatically injected into system prompt for personalization.

### Episodic Memory

The episodic memory system learns from successful task completions and retrieves similar solutions for future queries.

**How it works:**

1. **Automatic Storage**: After each successful interaction, stores:
   - Initial user query
   - Full conversation context
   - Tools used with arguments
   - Final solution
   - Timestamp

2. **Hybrid Search**: Retrieves similar episodes using:
   - 70% semantic similarity (task intent)
   - 30% BM25 keyword scoring (tool names, action verbs)

3. **Context Injection**: Before processing queries, injects compact context:

   ```
   [Episodic Memory - Similar Past Tasks]
   1. "read DOCX about ML" → fs_read → success (similarity: 0.85)
   2. "analyze PDF report" → fs_read, web_search → success (similarity: 0.78)
   ```

4. **Automatic Cleanup**: Maintains bounded memory:
   - Max 1000 episodes
   - Removes entries older than 90 days
   - Runs on startup

**Success Detection:**

- User feedback: "thanks", "perfect", "great", "worked"
- No error markers in response
- All tools executed successfully
- Filters out greetings and simple acknowledgments (<300 chars, no tools)

**Storage Location:**

- FAISS: `~/.mnemoai/{profile}/models/{model}/episodic_memory/episodic.index`
- ChromaDB: `~/.mnemoai/{profile}/models/{model}/episodic_memory/`

**Configuration:**

```yaml
ENABLE_EPISODIC_MEMORY: true
EPISODIC_MEMORY:
  STORE_TYPE: chromadb # or faiss
RAG:
  EMBED_MODEL_ID: # Required for both stores
    NAME: mxbai-embed-large
    TYPE: ollama
```

### ACE Playbook (Agentic Context Engineering)

The ACE Playbook learns strategies from both successes AND failures, implementing the Agentic Context Engineering framework for continuous improvement.

**How it works:**

1. **Reflector**: After each interaction, analyzes tool executions:
   - Detects failure patterns (file not found, string not found, permission denied, etc.)
   - Identifies successful strategies for specific tools (file_edit, execute_bash)
   - Extracts specific, actionable insights (not generic summaries)
   - Tracks metrics (success/failure rates, failure types) in `metrics.json`

2. **Playbook Store**: Maintains structured strategy entries:

   ```json
   {
     "context": "editing python files",
     "strategy": "Read the file first to get exact string including whitespace before using str_replace",
     "source": "Failed file_edit on 2026-02-01: string_not_found",
     "outcome": "failure",
     "tools": ["file_edit"],
     "confidence": 0.9
   }
   ```

3. **Context Injection**: Injects relevant strategies into the system prompt at startup:

   ```
   [Playbook - Learned Strategies]
   Avoid these patterns:
     ✗ [editing files]: Read the file first to get exact string before str_replace
   Effective strategies:
     ✓ [searching files]: Use glob_search instead of find for better performance
   ```

4. **Lazy Refinement**: Only deduplicates when hitting token limits, using semantic similarity if embeddings are configured.

**What gets stored:**

- **Failures**: Specific patterns like `string_not_found`, `file_not_found`, `permission_denied`, `command_failed`, etc.
- **Successes**: Only for tools with reusable patterns (file_edit, execute_bash with specific commands)
- **Not stored**: Generic successes without actionable strategies

**Key Differences from Episodic Memory:**

| Feature     | Episodic Memory       | ACE Playbook            |
| ----------- | --------------------- | ----------------------- |
| Stores      | Full task completions | Granular strategies     |
| Learns from | Successes only        | Successes AND failures  |
| Format      | Conversation context  | Structured rules        |
| Retrieval   | Semantic similarity   | Context + tool matching |

**Configuration:**

```yaml
ENABLE_PLAYBOOK: true
PLAYBOOK:
  MAX_ENTRIES: 500 # Maximum entries before refinement
  SIMILARITY_THRESHOLD: 0.85 # Threshold for merging similar strategies
  MAX_INJECT: 10 # Maximum entries to inject per query
```

**Storage Location:**

- Strategies: `~/.mnemoai/{profile}/models/{model}/playbook/playbook.json`
- Metrics: `~/.mnemoai/{profile}/models/{model}/playbook/metrics.json`

### Training Data Collection

#### Supervised Fine-Tuning (SFT)

- Use `/good` to mark high-quality responses
- Saved conversations include quality markers
- Extract labeled interactions for training

## 📦 Dependencies

All Python dependencies are listed in `requirements.txt`. The new productivity tools use only standard library features:

| Tool             | Python Packages                 | External Tools     |
| ---------------- | ------------------------------- | ------------------ |
| TodoWrite        | Standard library only           | None               |
| Edit Tool        | Standard library only           | None               |
| Glob Search      | Standard library (`glob`)       | None               |
| Grep Search      | Standard library (`subprocess`) | ripgrep (optional) |
| Error Handler    | Standard library (`functools`)  | None               |
| Git Safety       | Standard library (`subprocess`) | git                |
| Plan Mode        | Standard library (`json`, `os`) | None               |
| Background Tasks | Standard library (`threading`)  | None               |

**External Tools:**

- **ripgrep**: Required for `grep_search` tool. Install via system package manager (see Installation section). If not installed, the assistant automatically falls back to slower alternatives.

**Core Python Packages:**

- `langgraph`: Agent orchestration framework
- `langchain`, `langchain-core`: LLM abstraction layer
- `langchain-ollama`: Ollama integration
- `langchain-aws`: AWS Bedrock integration
- `langchain-openai`: OpenAI integration (also used for Bedrock Mantle OpenAI/Responses protocols)
- `langchain-anthropic`: Anthropic integration (Bedrock Mantle `anthropic` protocol)
- `aws-bedrock-token-generator`: Bearer-token auth for Bedrock Mantle
- `mcp`, `mcp[cli]`: Model Context Protocol
- `ollama`: Local LLM support
- `boto3`: AWS Bedrock/SageMaker
- `tiktoken`: Token counting
- `chromadb`, `faiss-cpu`: Vector stores for RAG
- `PyPDF2`, `python-docx`: Document readers
- `Pygments`: Code syntax highlighting
- `prompt_toolkit`: Interactive CLI
- `brave-search-python-client`: Web search
- `crawl4ai`: Web crawling

## 🛠️ Development

### Testing

The test suite uses `pytest` and is split into two tiers under `tests/`:

- **`tests/unit/`** — fast, deterministic tests for pure logic (BM25, reasoning helpers, response parsing, subtask parsing, the tool error handler, git-safety command classification, file editing/search, bash timeout handling, and episodic-memory heuristics). No LLM, Ollama, or network required, so they run in seconds and don't need a `config.yaml`.
- **`tests/integration/`** — end-to-end tests that drive the real agent against a live Ollama server and the MCP subprocess (routing, tool calls, bash timeout, no silent empty turns). Marked with `@pytest.mark.integration` and **auto-skipped** unless a runtime `utils/config.yaml` exists and the configured Ollama host is reachable.

```bash
# Install test dependencies
pip install -r requirements-dev.txt

# Run everything (integration auto-skips if Ollama/config aren't available)
python -m pytest

# Unit tier only (fast — good for CI and pre-commit)
python -m pytest tests/unit

# Integration tier only (requires Ollama running + a real config.yaml)
python -m pytest -m integration

# Run a single file
python -m pytest tests/unit/test_bm25.py
```

When adding new code, keep import-time side effects independent of `config.yaml` so the module stays unit-testable.

### Adding New Tools

1. Create tool file in `server/tools/`:

```python
from mcp.server.fastmcp import FastMCP

def register_your_tool(mcp: FastMCP):
    @mcp.tool()
    async def your_tool(param: str) -> str:
        """Tool description for the LLM."""
        # Implementation
        return result
```

2. Register in `tools_manager.py`:

```python
from .your_tool import register_your_tool
register_your_tool(mcp)
```

### Adding New File Readers

1. Create reader in `server/tools/readers/`:

```python
async def read_your_format(path: str) -> str:
    """Read your custom format."""
    # Implementation
    return content
```

2. Register in `fs_read.py`:

```python
from .readers.your_reader import read_your_format
# Add to file type detection logic
```

### Switching Model Providers

The application uses **controller classes** for centralized model management. To switch providers, just update `config.yaml`:

**For LLM:**

```yaml
MODEL_ID:
  NAME: your-model-name
  TYPE: ollama # or bedrock, sagemaker
```

**For Vision:**

```yaml
VISION_MODEL_ID:
  NAME: your-vision-model
  TYPE: ollama # or sagemaker
```

**For Embeddings:**

```yaml
RAG:
  EMBED_MODEL_ID:
    NAME: mxbai-embed-large
    TYPE: ollama
```

The controllers (`llm_controller.py`, `vision_model_controller.py`, `embeddings_controller.py`) handle all provider-specific initialization automatically.

### Adding New Model Providers

1. Update the appropriate controller in `models/`:

```python
def initialize_model(self):
    if self.model_type == "your_provider":
        # Your provider initialization
        self.model = YourProviderModel(...)
```

2. Add configuration in `config.yaml`

## 🔧 Ollama Utilities (Optional)

The `bash/` directory contains helper scripts for Ollama users on macOS and Linux.

### Ollama Environment Setup (macOS)

Sets Ollama performance environment variables at boot and launches the Ollama app:

```bash
# Variables set: OLLAMA_FLASH_ATTENTION=1, OLLAMA_KV_CACHE_TYPE=q8_0, OLLAMA_NUM_GPU=999
```

**Setup:**

1. Edit `bash/ollama-env-mac/ollama.environment.plist` (no changes needed for defaults)
2. Copy to LaunchAgents:

```bash
cp bash/ollama-env-mac/ollama.environment.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/ollama.environment.plist
```

### VRAM Cleaner

Automatically unloads idle Ollama models from VRAM to free GPU memory. Useful when running multiple models or when GPU memory is limited.

**macOS (LaunchAgent, runs every 60 seconds):**

1. Edit `bash/ollama-freeup-vram/com.ollama.vramcleaner.plist`:
   - Replace `<PATH_TO_FOLDER>` with the actual path to this repository
   - Replace `<PATH_TO_USER_HOME>` with your home directory
2. Install:

```bash
cp bash/ollama-freeup-vram/com.ollama.vramcleaner.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.ollama.vramcleaner.plist
```

**Linux (systemd):**

1. Edit `bash/ollama-freeup-vram/ollama-vram-cleaner.service`:
   - Replace `<PATH_TO_FOLDER>` with the actual path
2. Install:

```bash
sudo cp bash/ollama-freeup-vram/ollama-vram-cleaner.service /etc/systemd/system/
sudo systemctl enable ollama-vram-cleaner
sudo systemctl start ollama-vram-cleaner
```

See `bash/ollama-freeup-vram/README.md` and `bash/ollama-env-mac/README.md` for more details.

## 🐛 Troubleshooting

### Common Issues

**MCP Connection Errors**

- Verify Python path in `client.py` matches your environment
- Check server path is correct
- Ensure all dependencies are installed (`pip install -r requirements.txt`)

**Model Loading Issues**

- Verify model name and type in `config.yaml`
- For Ollama: Ensure Ollama is running (`ollama serve`) and model is pulled (`ollama pull model-name`)
- For AWS Bedrock: Check credentials (`aws sts get-caller-identity`), region, and model access
- For OpenAI: Ensure `OPENAI_API_KEY` environment variable is set

**RAG / Episodic Memory Not Working**

- Ensure `ENABLE_RAG: true` (or `ENABLE_EPISODIC_MEMORY: true`) in config
- Verify embedding model is configured and available (`RAG.EMBED_MODEL_ID` in config)
- For Ollama embeddings: ensure the embedding model is pulled (`ollama pull mxbai-embed-large`)
- Check logs for "fallback embeddings" warnings — this means the real model is unreachable
- Verify documents are being indexed with `list_documents()`

**Permission Errors**

- Ensure write permissions for `~/.mnemoai/`
- Ensure write permissions for `~/.mnemoai/` (the app home: config, plans, tasks, per-profile state)
- Check file paths in configuration

**Import Errors on Startup**

- Some dependencies (chromadb, faiss-cpu, crawl4ai) can be tricky to install. Check platform-specific instructions.
- On Apple Silicon: `faiss-cpu` may require `pip install faiss-cpu --no-cache-dir`

### Logging

Logs are output to stderr with configurable level:

```bash
LOG_LEVEL=DEBUG mnemoai  # Detailed logs
LOG_LEVEL=INFO mnemoai   # Normal logs (default)
```

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🤝 Contributing

This is a personal development project. If you'd like to use or extend it, feel free to fork the repository and adapt it to your needs!

If you use this code in your own projects, attribution to the original repository is appreciated but not required.

## 🙏 Acknowledgments

- Built with [LangGraph](https://github.com/langchain-ai/langgraph) and [LangChain](https://github.com/langchain-ai/langchain)
- Uses [FastMCP](https://github.com/jlowin/fastmcp) for Model Context Protocol
- Powered by [Ollama](https://ollama.ai), [Amazon Bedrock](https://aws.amazon.com/bedrock/), and [Amazon SageMaker AI](https://aws.amazon.com/sagemaker/ai/)
