Metadata-Version: 2.4
Name: sinapsis-llama-cpp
Version: 0.5.0
Summary: Sinapsis templates for llama.cpp text generation.
Author-email: SinapsisAI <dev@sinapsis.tech>
Project-URL: Homepage, https://sinapsis.tech
Project-URL: Documentation, https://docs.sinapsis.tech/docs
Project-URL: Tutorials, https://docs.sinapsis.tech/tutorials
Project-URL: Repository, https://github.com/Sinapsis-AI/sinapsis-chatbots.git
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: huggingface-hub>=0.32.4
Requires-Dist: llama-cpp-python>=0.3.33
Requires-Dist: sinapsis>=0.2.26
Requires-Dist: sinapsis-chatbots-base[torch]
Provides-Extra: server
Requires-Dist: llama-cpp-python[server]>=0.3.27; extra == "server"
Provides-Extra: all
Requires-Dist: sinapsis-llama-cpp[server]; extra == "all"
Dynamic: license-file

<h1 align="center">
<br>
<a href="https://sinapsis.tech/">
  <img
    src="https://github.com/Sinapsis-AI/brand-resources/blob/main/sinapsis_logo/4x/logo.png?raw=true"
    alt="" width="300">
</a>
<br>
Sinapsis LLaMA CPP
<br>
</h1>

<h4 align="center">Sinapsis templates for local GGUF-backed text completion, streaming, and MCP with llama-cpp-python.</h4>

<p align="center">
<a href="#installation">🐍 Installation</a> •
<a href="#features">🚀 Features</a> •
<a href="#example">📚 Usage example</a> •
<a href="#documentation">📙 Documentation</a> •
<a href="#license">🔍 License</a>
</p>

The `sinapsis-llama-cpp` package provides Sinapsis templates built on top of [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) for running local or Hugging Face-hosted GGUF models through `LLMConversationPacket`.

<h2 id="installation">🐍 Installation</h2>

Install using your preferred package manager. We strongly recommend using <code>uv</code>.

Install the base package:

```bash
uv pip install sinapsis-llama-cpp --extra-index-url https://pypi.sinapsis.tech
```

Or with raw <code>pip</code>:

```bash
pip install sinapsis-llama-cpp --extra-index-url https://pypi.sinapsis.tech
```

> [!IMPORTANT]
> If you also want the upstream `llama-cpp-python` HTTP server, install the optional `server` extra:
>
> ```bash
> uv pip install sinapsis-llama-cpp[server] --extra-index-url https://pypi.sinapsis.tech
> ```
>
> Or install all optional dependencies:
>
> ```bash
> uv pip install sinapsis-llama-cpp[all] --extra-index-url https://pypi.sinapsis.tech
> ```

<h2 id="features">🚀 Features</h2>

<h3>Templates Supported</h3>

- **LLaMACPPTextCompletion**: Standard llama.cpp chat completion using `LLMConversationPacket`.
- **LLaMACPPStreamingTextCompletion**: Async streaming variant that yields partial packets during generation.
- **LLaMACPPTextCompletionWithMCP**: llama.cpp chat completion with packet-native MCP tool state.

<details>
<summary><strong><span style="font-size: 1.25em;">🧩 Common Attributes</span></strong></summary>

- `init_args` (`LLaMACPPInitArgs`, required): llama.cpp runtime arguments.
  - `llm_model_name` (`str`, required): Hugging Face repo id or local directory containing the GGUF file.
  - `llm_model_file` (`str`, required): GGUF file name to load.
  - Additional runtime controls include `n_ctx`, `n_threads`, `n_gpu_layers`, `flash_attn_type`,
    `tensor_split`, `use_mmap`, `use_mlock`, `seed`, and `chat_format`.
- `completion_args` (`LLaMACPPCompletionArgs`, required): Request-time generation parameters such as
  `max_tokens`, `temperature`, `top_p`, `top_k`, `min_p`, penalties, stop sequences, and structured-output settings.
- `reasoning_start_tag` / `reasoning_end_tag` (`str | None`, optional): Tags used to extract reasoning into
  `LLMConversationPacket.reasoning` before the final response is cleaned.

</details>

> [!TIP]
> Use CLI command ``` sinapsis info --all-template-names``` to show a list with all the available Template names installed with Sinapsis LLaMA CPP.

<h2 id="example">📚 Usage example</h2>

The following agent runs one local llama.cpp text-completion step using `LLMConversationInput`.

<details id='usage'><summary><strong><span style="font-size: 1.0em;"> Config</span></strong></summary>

```yaml
agent:
  name: text_completion
  description: Single-shot llama-cpp text completion for Q&A and text generation.

templates:
  - template_name: InputTemplate
    class_name: InputTemplate
    attributes: {}

  - template_name: LLMConversationInput
    class_name: LLMConversationInput
    template_input: InputTemplate
    attributes:
      prompt: Give three short tips for staying organized during a busy week.
      system_prompt: You are a helpful assistant.

  - template_name: LLaMACPPTextCompletion
    class_name: LLaMACPPTextCompletion
    template_input: LLMConversationInput
    attributes:
      init_args:
        llm_model_name: unsloth/Qwen3.5-9B-GGUF
        llm_model_file: Qwen3.5-9B-Q4_K_M.gguf
        n_ctx: 8192
        n_threads: 8
        n_gpu_layers: -1
        flash_attn_type: -1
        seed: 10
      completion_args:
        max_tokens: 4096
        temperature: 0.2
        seed: 10

```

</details>
<h2 id="documentation">📙 Documentation</h2>

Documentation for this and other sinapsis packages is available on the [sinapsis website](https://docs.sinapsis.tech/docs)

Tutorials for different projects within sinapsis are available at [sinapsis tutorials page](https://docs.sinapsis.tech/tutorials)

<h2 id="license">🔍 License</h2>

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the [LICENSE](LICENSE) file.

For commercial use, please refer to our [official Sinapsis website](https://sinapsis.tech) for information on obtaining a commercial license.
