Metadata-Version: 2.4
Name: llama-simple-chat-bot
Version: 0.1.0
Summary: A configurable local chatbot library with lightweight memory indexing.
Author: GGN_2015
License: MIT License
        
        Copyright (c) 2026 GGN_2015
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Keywords: chatbot,local-llm,memory,gguf,llama-cpp
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Topic :: Communications :: Chat
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: local
Requires-Dist: llama-cpp-python>=0.3.0; extra == "local"
Dynamic: license-file

# llama-simple-chat-bot

A small Python library for configurable local chatbots.

`llama-simple-chat-bot` runs an open-source language model on the local machine and
adds a lightweight persistent memory index. It is designed for Windows and Linux
on amd64 machines, and it does not require a GPU. The default runtime path uses
GGUF models through `llama-cpp-python` with `n_gpu_layers` set to `0`.

> [!NOTE]
> The library does not call remote OpenAI APIs. Model inference is local, while
> memory indexing is handled with lightweight files on disk.

## Features

- JSON bot profiles: name, description, personality, birthday, skills, species,
  model settings, and memory directory.
- Local LLM backends: `llama_cpp_python`, `llama_cpp_cli`, and a deterministic
  `echo` backend for tests.
- Persistent memory: every exchange is logged, split into segments, summarized,
  indexed, and searched during future conversations.
- Associative recall: related past segments can be injected into the prompt as
  context before the model answers.
- CLI and Python API.
- No required third-party dependencies for the core package. Local inference is
  available through the optional `local` extra.

## Recommended Local Models

These presets are intentionally small enough for local GGUF use, with a few
better-quality options for slower but more reliable CPU chat:

- `qwen2.5-0.5b-instruct-q4_k_m`: about 491 MB, multilingual, good default for
  Chinese and English.
- `qwen2.5-1.5b-instruct-q4_k_m`: about 1120 MB, much better than 0.5B for
  identity stability, memory use, and ordinary chat quality.
- `qwen2.5-3b-instruct-q4_k_m`: about 2100 MB, a stronger choice for roleplay,
  Chinese chat, and basic reasoning if you can accept slower CPU inference.
- `smollm2-360m-instruct-q4_k_m`: about 271 MB, very small and fast for quick
  experiments.

The project can also use any local GGUF file supported by `llama.cpp`.

> [!TIP]
> If you care about actual chat quality, start with
> `qwen2.5-1.5b-instruct-q4_k_m`. Use `qwen2.5-3b-instruct-q4_k_m` when role
> consistency and answer quality matter more than speed. Keep
> `qwen2.5-0.5b-instruct-q4_k_m` for lightweight testing, and
> `smollm2-360m-instruct-q4_k_m` only for very small experiments.

## Setup

Create a virtual environment before installing optional local inference
dependencies.

> [!IMPORTANT]
> Install optional dependencies inside a virtual environment. The project does
> not require modifying your base Python environment.

The recommended CPU-only path installs the core package first. The first real
local-model run then installs the prebuilt CPU `llama-cpp-python` wheel
automatically if it is missing:

```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip
python -m pip install -e .
```

On Linux:

```bash
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -e .
```

> [!TIP]
> The automatic installer runs `python -m pip install -r requirements-local-cpu.txt`.
> That requirements file uses
> `https://abetlen.github.io/llama-cpp-python/whl/cpu` plus
> `--only-binary llama-cpp-python`. If no compatible wheel exists for your Python
> version and platform, pip fails instead of starting a slow local build.

If the CPU wheel is unavailable on your machine, either install `llama.cpp`
separately and set `"backend": "llama_cpp_cli"` in the config, or use
`python -m pip install -e '.[local]'` when you intentionally want to build
`llama-cpp-python` from source.

> [!NOTE]
> The `local` extra is kept for packaging compatibility, but the documented
> quick path uses `requirements-local-cpu.txt` because pip dependency metadata
> cannot store a custom wheel index URL.

> [!WARNING]
> CPU-only inference is usable with small GGUF models, but it is still slower
> than GPU inference. Keep `model.n_gpu_layers` at `0` when the machine has no
> compatible GPU.

## Quick Start

Write an example config:

```bash
llama-simple-chat-bot init-config examples/my_bot.json
```

List model presets:

```bash
llama-simple-chat-bot models
```

Download a small GGUF model:

```bash
llama-simple-chat-bot download-model qwen2.5-0.5b-instruct-q4_k_m --models-dir models
```

For a better local chat model:

```bash
llama-simple-chat-bot download-model qwen2.5-1.5b-instruct-q4_k_m --models-dir models
```

Or a stronger 3B preset:

```bash
llama-simple-chat-bot download-model qwen2.5-3b-instruct-q4_k_m --models-dir models
```

Start chatting:

```bash
llama-simple-chat-bot chat --config examples/my_bot.json
```

## Shortest Start

> [!TIP]
> Use this section when you only want the shortest path from a fresh checkout to
> a running bot.

For a real local model run:

```bash
python -m venv .venv
source .venv/bin/activate
python -m pip install -e .
llama-simple-chat-bot init-config bot.json
llama-simple-chat-bot download-model qwen2.5-0.5b-instruct-q4_k_m --models-dir models
llama-simple-chat-bot chat --config bot.json
```

On Windows PowerShell, use:

```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -e .
llama-simple-chat-bot init-config bot.json
llama-simple-chat-bot download-model qwen2.5-0.5b-instruct-q4_k_m --models-dir models
llama-simple-chat-bot chat --config bot.json
```

If you want a more usable default on CPU, replace the download step with:

```bash
llama-simple-chat-bot download-model qwen2.5-1.5b-instruct-q4_k_m --models-dir models
```

For a dependency-free smoke test that does not load a model:

```bash
python -m pip install -e .
llama-simple-chat-bot ask --config examples/echo_config.json "hello"
```

> [!NOTE]
> The `echo` backend is only a deterministic smoke-test backend. It verifies the
> CLI, config loading, and memory plumbing without loading a language model.

## Example Profiles

The `examples/` directory includes ready-to-edit bot profiles:

- [`examples/bot_config.json`](examples/bot_config.json): general local assistant.
- [`examples/nekomimi_config.json`](examples/nekomimi_config.json): 中文猫娘聊天伙伴.
- [`examples/coding_mentor_config.json`](examples/coding_mentor_config.json):
  pragmatic coding mentor.
- [`examples/study_partner_config.json`](examples/study_partner_config.json):
  structured study partner.
- [`examples/storyteller_config.json`](examples/storyteller_config.json):
  collaborative fiction and worldbuilding companion.
- [`examples/echo_config.json`](examples/echo_config.json): dependency-free
  smoke-test bot.

Run any profile with:

```bash
llama-simple-chat-bot chat --config examples/nekomimi_config.json
```

To debug memory retrieval while chatting, add `--verbose`:

```bash
llama-simple-chat-bot chat --config examples/nekomimi_config.json --verbose
```

Verbose mode prints diagnostics before the model starts generating: the memory
index path and detected encoding, the query terms, whether the turn used
recent-overview or keyword retrieval, scored candidate segments, selected prompt
hits, and the recalled memory block injected into the model context. The same
diagnostics are available for one-shot asks:

```bash
llama-simple-chat-bot ask --config examples/my_bot.json --verbose "What do you remember about Python packaging?"
```

Send one message:

```bash
llama-simple-chat-bot ask --config examples/my_bot.json "What do you remember about me?"
```

Search memory without loading the model:

```bash
llama-simple-chat-bot memory-search --config examples/my_bot.json "Python packaging"
```

## JSON Config

See [`examples/bot_config.json`](examples/bot_config.json).

Important fields:

- `name`, `description`, `personality`, `birthday`, `skills`, and `species`
  are injected at runtime as authoritative system rules, so the bot knows its
  configured identity.
- `memory_dir` controls where `index.json` and segment `.jsonl` files are
  stored.
- `model.backend` selects `llama_cpp_python`, `llama_cpp_cli`, or `echo`.
- `model.model_path` points to a local GGUF file.
- Preset downloads are available through `llama-simple-chat-bot download-model` for
  `qwen2.5-0.5b-instruct-q4_k_m`, `qwen2.5-1.5b-instruct-q4_k_m`,
  `qwen2.5-3b-instruct-q4_k_m`, and `smollm2-360m-instruct-q4_k_m`.
- `model.n_gpu_layers` defaults to `0`, which keeps inference on CPU.
- `system_rules` is the place to enforce speech style and role constraints, such
  as asking a catgirl profile to naturally end replies with `喵`.
- `memory.segment_exchange_limit` controls when a new memory segment starts.
- `memory.summary_mode` can be `extractive` or `llm`. `extractive` is faster;
  `llm` asks the local model to rewrite the segment summary.

Relative paths inside a config file are resolved relative to that config file.
JSON config files can be encoded as UTF-8, UTF-8 with BOM, GB2312, or GBK.
Memory `index.json` and segment `.jsonl` files are read with the same encoding
fallbacks and are written back as UTF-8.

> [!NOTE]
> GB2312 and GBK support is intended for Chinese JSON config files produced by
> older Windows editors or tooling.

## Python API

```python
from llama_simple_chat_bot import BotConfig, ChatBot

config = BotConfig.from_file("examples/bot_config.json")
bot = ChatBot(config)

reply = bot.ask("Remember that I prefer SQLite for small apps.")
print(reply)

for hit in bot.search_memory("SQLite"):
    print(hit.summary)
```

You can also build the config in code:

```python
from llama_simple_chat_bot import BotConfig, ChatBot, MemoryConfig, ModelConfig

config = BotConfig(
    name="Mira",
    description="A practical local assistant with persistent memory.",
    personality="warm, curious, and concise",
    birthday="2026-06-02",
    skills=["Python", "summarization"],
    species="local digital companion",
    memory_dir="./memory/mira",
    model=ModelConfig(
        backend="llama_cpp_python",
        model_path="./models/qwen2.5-0.5b-instruct-q4_k_m.gguf",
        chat_format="chatml",
        n_gpu_layers=0,
    ),
    memory=MemoryConfig(summary_mode="extractive"),
)

bot = ChatBot(config)
print(bot.ask("Hello."))
```

## Memory Layout

The configured memory directory contains:

- `index.json`: all conversation log entries plus segment metadata, summaries,
  keywords, and log file references.
- `segments/*.jsonl`: append-only per-segment logs.

At response time, the bot searches the index for direct matches and related
associative matches, formats the best hits, and injects them into the local
model's system context.

For broad questions like "what did we talk about before?", memory recall uses
recent segments as an overview. For questions with a concrete topic, such as
"did we talk about SQLite?", it uses keyword retrieval so old but relevant
segments can beat newer unrelated chats.

> [!WARNING]
> Memory files contain conversation content. Do not commit real user memory
> directories, downloaded model files, or private chat logs.

## Tests

The test suite uses only the built-in `unittest` module and the `echo` backend:

```bash
python -m unittest
```

## Acknowledgements

This project builds on the local inference ecosystem around `llama.cpp`, the
Python bindings provided by `llama-cpp-python`, open GGUF model releases from
the Qwen and SmolLM communities, and the Python standard library.
