Metadata-Version: 2.4
Name: sinapsis-unsloth
Version: 0.1.0
Summary: Package for fine-tuning, running and exporting Large Language Models with Unsloth.
Author-email: SinapsisAI <dev@sinapsis.tech>
Project-URL: Homepage, https://sinapsis.tech
Project-URL: Documentation, https://docs.sinapsis.tech/docs
Project-URL: Tutorials, https://docs.sinapsis.tech/tutorials
Project-URL: Repository, https://github.com/Sinapsis-AI/sinapsis-unsloth.git
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: sinapsis>=0.2.0
Requires-Dist: unsloth>=2025.6.2
Requires-Dist: torch>=2.6.0
Requires-Dist: setuptools>=80.9.0
Provides-Extra: huggingface
Requires-Dist: unsloth[huggingface]>=2025.6.2; extra == "huggingface"
Provides-Extra: flashattention
Requires-Dist: ninja>=1.13.0; extra == "flashattention"
Requires-Dist: packaging>=25.0; extra == "flashattention"
Requires-Dist: flash-attn>=2.8.3; extra == "flashattention"
Requires-Dist: torch==2.9.0; extra == "flashattention"
Provides-Extra: sinapsis-data-readers
Requires-Dist: sinapsis-data-readers>=0.1.10; extra == "sinapsis-data-readers"
Provides-Extra: all
Requires-Dist: sinapsis-unsloth[huggingface]; extra == "all"
Requires-Dist: sinapsis-unsloth[flashattention]; extra == "all"
Requires-Dist: sinapsis-unsloth[sinapsis-data-readers]; extra == "all"
Dynamic: license-file

<h1 align="center">
<br>
<br>
<a href="https://sinapsis.tech/">
  <img
    src="https://github.com/Sinapsis-AI/brand-resources/blob/main/sinapsis_logo/4x/logo.png?raw=true"
    alt="" width="300">
</a>
<br>
Sinapsis Unsloth
<br>
</h1>

<h4 align="center">Templates for optimized LLM fine-tuning and deployment.</h4>

<p align="center">
<a href="#installation">🐍 Installation</a> •
<a href="#features">🚀 Features</a> •
<a href="#example">📚 Usage example</a> •
<a href="#documentation">📙 Documentation</a> •
<a href="#license">🔍 License</a>
</p>

The `sinapsis-unsloth` module provides ready-to-use templates for **continued pretraining**, **instruct fine-tuning**, **conversational fine-tuning**, inference and **model export** to GGUF, merged, and quantized formats using [Unsloth](https://docs.unsloth.ai/).

<h2 id="installation">🐍 Installation</h2>

Install using your package manager of choice. We recommend `uv` for faster installations.

<details open>
<summary id="configuration"><strong><span style="font-size: 1.25em;">Standard Installation (Pre-built Wheels)</span></strong></summary>

This method automatically installs optimized pre-built wheels for `flash-attn`, skipping the long compilation times.

Supported: Linux (x86_64), Python 3.10 - 3.12, CUDA 12.x

Using <code>uv</code>:

```bash
# Install Flash Attention (Example for Python 3.10 + CUDA 12.4)
uv pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.5.4/flash_attn-2.8.3+cu124torch2.9-cp310-cp310-linux_x86_64.whl

# Install Sinapsis Unsloth
uv pip install sinapsis-unsloth[all] --extra-index-url https://pypi.sinapsis.tech
```

Using raw <code>pip</code>:
```bash
# Install Flash Attention (Example for Python 3.10 + CUDA 12.4)
pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.5.4/flash_attn-2.8.3+cu124torch2.9-cp310-cp310-linux_x86_64.whl

# Install Sinapsis Unsloth
pip install sinapsis-unsloth[all] --extra-index-url https://pypi.sinapsis.tech
```

</details>

<details>
<summary id="configuration"><strong><span style="font-size: 1.25em;">Manual Build (From Source)</span></strong></summary>

Use this if you are on an unsupported platform (e.g., Windows, non-standard CUDA versions) or need to compile flash-attn yourself.

Using <code>uv</code>:
```bash
export MAX_JOBS=4 # Adjust based on your RAM specs
uv pip install torch packaging ninja setuptools
uv pip install sinapsis-unsloth[all] --extra-index-url https://pypi.sinapsis.tech
```

Using raw <code>pip</code>:
```bash
export MAX_JOBS=4 # Adjust based on your RAM specs
pip install torch packaging ninja setuptools
pip install sinapsis-unsloth[all] --extra-index-url https://pypi.sinapsis.tech
```

</details>

<h2 id="features">🚀 Features</h2>

The templates support all capabilities from Unsloth for efficient LLM fine-tuning, inference, and model export, including:

- **Optimized Training**: 4-bit (QLoRA), 8-bit, 16-bit, and full precision fine-tuning
- **Hardware Efficiency**: Reduced GPU memory usage with Unsloth's optimized kernels
- **Flexible Export**: GGUF quantization and Merged model export options for deployment
- **High-Performance Inference**: Native 4-bit inference with dynamic chat templating and streaming

<h3> Templates Supported</h3>

<h4>Training</h4>

* **UnslothPretrainer**: Designed for continued pretraining (domain adaptation) on raw text. Features efficient sequence packing and specific learning rate controls for embeddings.

* **UnslothInstructTrainer**: Optimized for instruction fine-tuning. Processes standard instruction-input-response triplets with configurable preambles and dynamic formatting (handling optional inputs gracefully).

* **UnslothConversationTrainer**: Specialized for conversational AI fine-tuning with chat datasets. Supports both **ShareGPT** and **Alpaca** formats (with auto-conversion), handles dynamic chat templating, and supports response-only loss masking.

<h4>Inference</h4>

* **UnslothInferenceCompletion**: Raw text completion template for base models or custom formatting needs.

* **UnslothInferenceInstruct**: Streamlined inference for instruction-tuned models using standard task preambles.

* **UnslothInferenceConversational**: Manages multi-turn chat history, system prompts, and dynamic chat template application for conversational models.

* **UnslothInferenceReasoning**: Extends conversational inference to support Chain-of-Thought (CoT) models (e.g., DeepSeek-R1), handling the extraction of internal reasoning traces.

<h4>Export</h4>

* **UnslothExportGGUF**: Exports models to GGUF format for efficient CPU/Edge inference (e.g., Llama.cpp). Supports configurable quantization methods (q4_k_m, q8_0, etc.).

* **UnslothExportMerged**: Merges LoRA adapters back into the base model (16-bit or 4-bit) for deployment on vLLM, or pushes directly to the Hugging Face Hub.

<details>
<summary id="configuration"><strong><span style="font-size: 1.25em;">🌍 General Attributes</span></strong></summary>

The `model_args` attribute controls how Unsloth loads and configures the model.

- **`model_name`** (`str`, required): Model ID or local path.
- **`cache_dir`** (`str`): Cache directory. Default: `SINAPSIS_CACHE_DIR`.
- **`max_seq_length`** (`int`): Maximum sequence length. Default: `2048`.
- **`dtype`** (`"auto" | "bfloat16" | "float16"`): Weight precision. Default: `"auto"`.
- **`load_in_4bit`** (`bool`): Enable 4-bit quantization. Default: `True`.
- **`load_in_8bit`** (`bool`): Enable 8-bit quantization. Default: `False`.
- **`load_in_16bit`** (`bool`): Load weights in FP16. Default: `False`.
- **`full_finetuning`** (`bool`): Enable full fine-tuning. Default: `False`.
- **`device_map`** (`str`): Device placement strategy. Default: `"sequential"`.
- **`use_gradient_checkpointing`** (`str`): Checkpointing mode. Default: `"unsloth"`.
- **`fast_inference`** (`bool`): Enable optimized inference. Default: `False`.
- **`gpu_memory_utilization`** (`float`): Max GPU memory fraction. Default: `0.5`.
- **`random_state`** (`int`): Random seed. Default: `3407`.
- **`max_lora_rank`** (`int`): Maximum LoRA rank. Default: `64`.

</details>

<details>
<summary id="configuration"><strong><span style="font-size: 1.25em;">⚙️ Fine-tuning Attributes</span></strong></summary>

These attributes apply to all fine-tuning templates:

- **`lora_args`** (`UnslothLoraArgs`)
  - LoRA configuration (rank, alpha, dropout, target modules, gradient checkpointing).

- **`trainer_args`** (`UnslothTrainerArgs`)
  - Trainer options (text field, packing, sequence length, loss type).

- **`training_args`** (`UnslothTrainingArgs`)
  - Hugging Face training parameters (batch size, learning rate, logging, saving).

- **`train_dataset`** (`DatasetConfig`)
  - Dataset configuration, including:
    - `loader_args` (source and loading parameters)
    - `map_args` (preprocessing)
    - `shuffle` (shuffling behavior)
    - `pre_tokenize` (tokenization options)

- **`resume_from_checkpoint`** (`bool`)
  - Resume training from the last checkpoint.

- **`save_path`** (`str`)
  - Directory where fine-tuned adapters or weights will be saved.

</details>

<details>
<summary id="inference"><strong><span style="font-size: 1.25em;">🧠 Inference Attributes</span></strong></summary>

These attributes configure Unsloth-based inference templates.

- **`rag_context_key`** (`str | None`)
  - Metadata key used to retrieve optional RAG context.

- **`generate_args`** (`UnslothGenerateArgs`)
  - Token generation settings (sampling, length, stopping, temperature, penalties).

- **`stream`** (`bool`)
  - Enables token-by-token console streaming during generation.

</details>


<details>
<summary id="export"><strong><span style="font-size: 1.25em;">📦 Export Attributes</span></strong></summary>

These attributes configure Unsloth-based model export templates.

- **`export_args`** (`UnslothExportBaseArgs`)
  - Export parameters such as save path, shard size, and memory limits.

- **`push_to_hub`** (`bool`)
  - Enables pushing the exported model to the Hugging Face Hub.

</details>

> [!TIP]
> Use CLI command ``` sinapsis info --all-template-names``` to show a list with all the available Template names installed with Sinapsis Unsloth.

> [!TIP]
> Use CLI command ```sinapsis info --example-template-config TEMPLATE_NAME``` to produce an example Agent config for the Template specified in ***TEMPLATE_NAME***.

For example, for ***UnslothPretrainer*** use ```sinapsis info --example-template-config UnslothPretrainer``` to produce the following example config:

```yaml
agent:
  name: my_test_agent
templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: UnslothPretrainer
  class_name: UnslothPretrainer
  template_input: InputTemplate
  attributes:
    model_args:
      model_name: '`replace_me:<class ''str''>`'
      cache_dir: /path/to/sinapsis/.cache
      max_seq_length: 2048
      dtype: auto
      load_in_4bit: true
      load_in_8bit: false
      load_in_16bit: false
      full_finetuning: false
      device_map: sequential
      use_gradient_checkpointing: unsloth
      fast_inference: false
      gpu_memory_utilization: 0.5
      random_state: 3407
      max_lora_rank: 64
    lora_args:
      r: 16
      target_modules:
      - q_proj
      - k_proj
      - v_proj
      - o_proj
      - gate_proj
      - up_proj
      - down_proj
      lora_alpha: 16
      lora_dropout: 0.0
      bias: none
      use_gradient_checkpointing: unsloth
      random_state: 3407
      use_rslora: false
      modules_to_save: null
      loftq_config: '`replace_me:<class ''dict''>`'
    trainer_args:
      dataset_text_field: text
      dataset_num_proc: null
      max_length: 1024
      packing: false
      packing_strategy: bfd
      eval_packing: false
      completion_only_loss: null
      assistant_only_loss: false
      loss_type: nll
      activation_offloading: false
    training_args:
      output_dir: trainer_output
      overwrite_output_dir: false
      eval_strategy: 'no'
      eval_steps: null
      per_device_train_batch_size: 8
      per_device_eval_batch_size: 8
      gradient_accumulation_steps: 1
      eval_accumulation_steps: null
      torch_empty_cache_steps: null
      learning_rate: 5.0e-05
      weight_decay: 0.0
      max_grad_norm: 1.0
      num_train_epochs: 3.0
      max_steps: null
      lr_scheduler_type: linear
      warmup_ratio: 0.0
      warmup_steps: null
      logging_strategy: steps
      logging_first_step: false
      logging_steps: 500
      save_strategy: steps
      save_steps: 500
      save_only_model: false
      use_cpu: false
      seed: 3407
      data_seed: null
      bf16: false
      fp16: false
      dataloader_drop_last: false
      dataloader_num_workers: 0
      remove_unused_columns: true
      load_best_model_at_end: false
      metric_for_best_model: loss
      optim: adamw_torch
      report_to: none
      push_to_hub: false
      hub_model_id: null
      embedding_learning_rate: 5.0e-05
    train_dataset:
      loader_args:
        path: '`replace_me:<class ''str''>`'
        name: null
        data_dir: null
        data_files: null
        split: null
        cache_dir: /path/to/sinapsis/.cache
        features: null
        num_proc: null
      map_args:
        desc: null
        batched: false
        batch_size: 1000
        num_proc: 0
        keep_in_memory: false
        load_from_cache_file: true
      shuffle:
        enabled: false
        args:
          seed: null
          keep_in_memory: false
          load_from_cache_file: true
      pre_tokenize:
        enabled: false
        args:
          add_special_tokens: true
          padding: do_not_pad
          truncation: do_not_truncate
          max_length: null
          stride: 0
          is_split_into_words: false
          padding_side: null
          verbose: true
        map_args:
          desc: null
          batched: false
          batch_size: 1000
          num_proc: 0
          keep_in_memory: false
          load_from_cache_file: true
    resume_from_checkpoint: false
    save_path: '`replace_me:<class ''str''>`'
```

<h2 id="example">📚 Usage example</h2>

The following agent exports the `unsloth/DeepSeek-R1-Distill-Qwen-1.5B` model in GGUF format with no quantization at the `artifacts/DeepSeek-R1-Distill-Qwen-1.5B-gguf` path.

<details id='usage'><summary><strong><span style="font-size: 1.0em;"> Config</span></strong></summary>

```yaml
agent:
  name: model_export_agent
  description: Agent to handle model export and conversion workflows

templates:
- template_name: InputTemplate
  class_name: InputTemplate
  attributes: {}
- template_name: UnslothExportGGUF
  class_name: UnslothExportGGUF
  template_input: InputTemplate
  attributes:
    model_args:
      model_name: unsloth/DeepSeek-R1-Distill-Qwen-1.5B
      dtype: "bfloat16"
      load_in_4bit: false
      gpu_memory_utilization: 1
    export_args:
      save_path : artifacts/DeepSeek-R1-Distill-Qwen-1.5B-gguf
      maximum_memory_usage: 1
      quantization_method: not_quantized
    push_to_hub: false
```
</details>

You can see additional fine-tuning agent configurations at the [configs](https://github.com/Sinapsis-AI/sinapsis-unsloth/tree/main/src/sinapsis_unsloth/configs) directory.

<h2 id="documentation">📙 Documentation</h2>

Documentation for this and other sinapsis packages is available on the [sinapsis website](https://docs.sinapsis.tech/docs)

Tutorials for different projects within sinapsis are available at [sinapsis tutorials page](https://docs.sinapsis.tech/tutorials)


<h2 id="license">🔍 License</h2>

This project is licensed under the AGPLv3 license, which encourages open collaboration and sharing. For more details, please refer to the [LICENSE](LICENSE) file.

For commercial use, please refer to our [official Sinapsis website](https://sinapsis.tech) for information on obtaining a commercial license.
