Metadata-Version: 2.4
Name: trillim
Version: 0.1.1
Summary: The fastest inference framework to run BitNet models on CPUs
Project-URL: Repository, https://github.com/Vineet-Vinod/Trillim
Project-URL: Issues, https://github.com/Vineet-Vinod/Trillim/issues
Author-email: Vineet V <vineetv314@gmail.com>
License: MIT License
        
        Copyright (c) 2026 Trillim.
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
        ---
        
        Proprietary Components
        
        The following components are NOT covered by the MIT License above and are
        governed by the Trillim Proprietary EULA below:
        
          - Pre-compiled binaries          trillim/_bin/inference, trillim/_bin/trillim-quantize
          - Wheel build script             scripts/build_wheels.py
        
        ---
        
        Trillim Proprietary End-User License Agreement (EULA)
        
        Copyright (c) 2026 Trillim. All rights reserved.
        
        1. GRANT OF LICENSE.  Trillim ("Trillim") grants you a non-exclusive,
           non-transferable, revocable license to use the closed components listed
           above solely for the purpose of running Trillim-compatible models on your
           own hardware.  You may use the closed components as part of applications
           you build, provided those applications do not expose the closed components
           as a standalone service or library.
        
        2. RESTRICTIONS.  You may NOT:
           (a) reverse engineer, decompile, disassemble, or otherwise attempt to
               derive the source code of any closed component, whether distributed
               as source or as a compiled binary;
           (b) redistribute, sublicense, rent, lease, or lend the closed components
               outside of the official Trillim package (i.e., the package distributed
               via PyPI under the name "trillim" or via Trillim's official GitHub
               releases);
           (c) modify, create derivative works of, or remove any proprietary notices
               from the closed components;
           (d) use the closed components to build a competing product that replicates
               the core functionality of Trillim's kernel library or quantizer.
        
        3. OWNERSHIP.  Trillim retains all right, title, and interest in and to the
           closed components, including all intellectual property rights therein.
        
        4. NO WARRANTY.  THE CLOSED COMPONENTS ARE PROVIDED "AS IS" WITHOUT WARRANTY
           OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
           OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT.
        
        5. LIMITATION OF LIABILITY.  IN NO EVENT SHALL TRILLIM BE LIABLE FOR ANY
           INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES ARISING
           OUT OF OR RELATED TO YOUR USE OF THE CLOSED COMPONENTS, REGARDLESS OF THE
           THEORY OF LIABILITY.
        
        6. TERMINATION.  This license terminates automatically if you violate any of
           its terms.  Upon termination, you must destroy all copies of the closed
           components in your possession.
License-File: LICENSE
License-File: THIRD_PARTY_LICENSES
Keywords: 1-bit,bitnet,cpu,inference,llm,ternary
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Requires-Dist: fastapi==0.128.0
Requires-Dist: faster-whisper==1.2.1
Requires-Dist: huggingface-hub==0.36.0
Requires-Dist: jinja2==3.1.0
Requires-Dist: pocket-tts==1.0.3
Requires-Dist: prompt-toolkit==3.0.52
Requires-Dist: transformers==4.57.1
Requires-Dist: uvicorn[standard]==0.40.0
Provides-Extra: dev
Requires-Dist: ruff==0.15.0; extra == 'dev'
Description-Content-Type: text/markdown

# Trillim

High-performance CPU inference engine for BitNet models. Runs ternary-quantized models ({-1, 0, 1} weights) using platform-specific SIMD optimizations (AVX2 on x86, NEON on ARM).

## Quick Start

### Prerequisites

- Python 3.12+ with [`uv`](https://github.com/astral-sh/uv) - can use pip or any package manager

### Install and run

```bash
# Install trillim
uv add trillim

# Pull a pre-quantized model
uv run trillim pull Trillim/BitNet-TRNQ

# Chat
uv run trillim chat Trillim/BitNet-TRNQ
```

### Quantize your own model

If you have a HuggingFace BitNet model with safetensors weights:

```bash
# Quantize model weights → qmodel.tensors + rope.cache
uv run trillim quantize <path-to-model> --model

# Optionally extract a PEFT LoRA adapter → qmodel.lora
uv run trillim quantize <path-to-model> --adapter <path-to-adapter>
```

## API Server

Trillim includes an OpenAI-compatible API server:

```bash
# Start the server
uv run trillim serve <model-dir>

# With voice pipeline (speech-to-text + text-to-speech)
uv run trillim serve <model-dir> --voice
```

Endpoints:
- `POST /v1/chat/completions` — chat completions (streaming supported)
- `POST /v1/completions` — text completions
- `GET /v1/models` — list loaded models
- `POST /v1/models/load` — hot-swap models and LoRA adapters at runtime
- `POST /v1/audio/transcriptions` — speech-to-text (with `--voice`)
- `POST /v1/audio/speech` — text-to-speech (with `--voice`)
- `GET /v1/voices` — list available TTS voices
- `POST /v1/voices` — register a custom voice from audio (need to accept pocket-tts' terms on huggingface)

## Python SDK

The server is built on a composable SDK. Each capability (LLM, Whisper, TTS) is a standalone component:

```python
from trillim import Server, LLM, TTS, Whisper

# Inference only
Server(LLM("models/BitNet")).run()

# Inference + voice
Server(LLM("models/BitNet"), Whisper(), TTS()).run()

# TTS only
Server(TTS()).run()
```

## LoRA Adapters

Trillim supports PEFT LoRA adapters as bf16 corrections on top of the ternary base model:

```bash
# Ensure qmodel.lora is in the directory 
# (uv run trillim quantize ... will do this)
uv run trillim chat Trillim/BitNet-TRNQ --lora
```

## Supported Architectures

- `BitnetForCausalLM` — BitNet with ternary weights and ReLU² activation
- `LlamaForCausalLM` — Llama-style with SiLU activation

## Platform Support

| Platform | Status |
|----------|--------|
| x86_64 (AVX2) | Supported |
| ARM64 (NEON) | Supported |

Thread count is auto-detected as `num_cores - 2`. Override by passing a `--threads N` CLI argument.

## License

The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (`inference`, `trillim-quantize`) bundled in the pip package are **proprietary** — you may use them as part of Trillim but may not reverse-engineer or redistribute them separately. See [LICENSE](LICENSE) for full terms.
