Metadata-Version: 2.4
Name: trillim
Version: 0.6.0
Summary: The fastest inference framework to run BitNet models on CPUs
Project-URL: Repository, https://github.com/Vineet-Vinod/Trillim
Project-URL: Issues, https://github.com/Vineet-Vinod/Trillim/issues
Author-email: Vineet V <vineetv314@gmail.com>
License: MIT License
        
        Copyright (c) 2026 Trillim.
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
        ---
        
        Proprietary Components
        
        The following components are NOT covered by the MIT License above and are
        governed by the Trillim Proprietary EULA below:
        
          - Pre-compiled binaries          trillim/_bin/inference, trillim/_bin/trillim-quantize
          - Wheel build script             scripts/build_wheels.py
        
        ---
        
        Trillim Proprietary End-User License Agreement (EULA)
        
        Copyright (c) 2026 Trillim. All rights reserved.
        
        1. GRANT OF LICENSE.  Trillim ("Trillim") grants you a non-exclusive,
           non-transferable, revocable license to use the closed components listed
           above solely for the purpose of running Trillim-compatible models on your
           own hardware.  You may use the closed components as part of applications
           you build, provided those applications do not expose the closed components
           as a standalone service or library.
        
        2. RESTRICTIONS.  You may NOT:
           (a) reverse engineer, decompile, disassemble, or otherwise attempt to
               derive the source code of any closed component, whether distributed
               as source or as a compiled binary;
           (b) redistribute, sublicense, rent, lease, or lend the closed components
               outside of the official Trillim package (i.e., the package distributed
               via PyPI under the name "trillim" or via Trillim's official GitHub
               releases);
           (c) modify, create derivative works of, or remove any proprietary notices
               from the closed components;
           (d) use the closed components to build a competing product that replicates
               the core functionality of Trillim's kernel library or quantizer.
        
        3. OWNERSHIP.  Trillim retains all right, title, and interest in and to the
           closed components, including all intellectual property rights therein.
        
        4. NO WARRANTY.  THE CLOSED COMPONENTS ARE PROVIDED "AS IS" WITHOUT WARRANTY
           OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
           OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT.
        
        5. LIMITATION OF LIABILITY.  IN NO EVENT SHALL TRILLIM BE LIABLE FOR ANY
           INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL, OR PUNITIVE DAMAGES ARISING
           OUT OF OR RELATED TO YOUR USE OF THE CLOSED COMPONENTS, REGARDLESS OF THE
           THEORY OF LIABILITY.
        
        6. TERMINATION.  This license terminates automatically if you violate any of
           its terms.  Upon termination, you must destroy all copies of the closed
           components in your possession.
License-File: LICENSE
License-File: THIRD_PARTY_LICENSES
Keywords: 1-bit,bitnet,cpu,inference,llm,ternary
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.12
Requires-Dist: ddgs==9.10.0
Requires-Dist: fastapi==0.128.0
Requires-Dist: huggingface-hub==0.36.0
Requires-Dist: jinja2==3.1.0
Requires-Dist: prompt-toolkit==3.0.52
Requires-Dist: trafilatura==2.0.0
Requires-Dist: transformers==4.57.1
Requires-Dist: uvicorn[standard]==0.40.0
Provides-Extra: dev
Requires-Dist: ruff==0.15.0; extra == 'dev'
Provides-Extra: voice
Requires-Dist: faster-whisper==1.2.1; extra == 'voice'
Requires-Dist: numpy==2.4.2; extra == 'voice'
Requires-Dist: pocket-tts==1.0.3; extra == 'voice'
Requires-Dist: python-multipart==0.0.22; extra == 'voice'
Description-Content-Type: text/markdown

# Trillim

Trillim is the platform for everything local AI. DarkNet is the CPU inference engine powering Trillim.

## Install

- Python 3.12+ required
- Linux also requires glibc 2.27+
- [uv](https://docs.astral.sh/uv/) is the recommended installer

Platform guides:

- [macOS](docs/install-mac.md)
- [Linux](docs/install-linux.md)
- [Windows](docs/install-windows.md)

If you installed with `uv`, prefix the CLI examples below with `uv run`.

## Common Workflows

### Pull a Model

```bash
trillim list
trillim pull Trillim/BitNet-TRNQ
```

### Chat in the Terminal

```bash
trillim chat Trillim/BitNet-TRNQ
```

`trillim chat` keeps multi-turn history, preserves exact token continuity for prior turns, and reuses the KV cache whenever the next turn can safely append to that exact prompt state. Use `/new` to reset the conversation or `q` to quit.

### Search-Augmented Chat

Use the `search` harness with a search-tuned model:

```bash
trillim chat Trillim/BitNet-Search-TRNQ --harness search
```

DuckDuckGo (`ddgs`) is the default provider. To use Brave:

```bash
export SEARCH_API_KEY=<your_api_key>
trillim chat Trillim/BitNet-Search-TRNQ --harness search --search-provider brave
```

### Serve an OpenAI-Compatible API

Start the server:

```bash
trillim serve Trillim/BitNet-TRNQ
```

Main endpoints:

- `POST /v1/chat/completions`
- `POST /v1/completions`
- `GET /v1/models`
- `POST /v1/models/load`

Example with the OpenAI Python client:

```python
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="unused")
response = client.chat.completions.create(
    model="BitNet-TRNQ",
    messages=[{"role": "user", "content": "Hello!"}],
)
```

To switch a running server to the search harness, call `POST /v1/models/load` with `"harness": "search"` and optional `"search_provider": "ddgs" | "brave"`.

### Quantize a Model or Adapter

If you have a HuggingFace model with safetensors weights (currently only supports BitNet models):

```bash
# Quantize model weights -> qmodel.tensors + rope.cache
trillim quantize <path-to-model> --model

# Extract a PEFT LoRA adapter -> qmodel.lora
trillim quantize <path-to-model> --adapter <path-to-adapter>
```

### Use a LoRA Adapter

```bash
# Quantize a PEFT adapter into Trillim's format
trillim quantize <path-to-base-model> --adapter <path-to-adapter>

# Run the base model with the adapter
trillim chat Trillim/BitNet-TRNQ --lora <adapter-dir>

# Or pull a pre-quantized adapter and use it by ID
trillim pull Trillim/BitNet-GenZ-LoRA-TRNQ
trillim chat Trillim/BitNet-TRNQ --lora Trillim/BitNet-GenZ-LoRA-TRNQ
```

The same adapter settings can be changed at runtime through `POST /v1/models/load`.

### Runtime Quantization

Runtime quantization reduces memory use for selected layers during inference:

- `--lora-quant <type>` for LoRA layers: `none`, `bf16`, `int8`, `q4_0`, `q5_0`, `q6_k`, `q8_0`
- `--unembed-quant <type>` for the unembedding layer: `int8`, `q4_0`, `q5_0`, `q6_k`, `q8_0`

```bash
trillim chat Trillim/BitNet-TRNQ --lora <adapter-dir> --lora-quant int8
trillim chat Trillim/BitNet-TRNQ --unembed-quant q4_0
trillim serve Trillim/BitNet-TRNQ --lora-quant q8_0 --unembed-quant q4_0
```

### Voice Support

Install the optional `voice` extra before using speech endpoints:

```bash
uv add "trillim[voice]"
```

Or with `pip`:

```bash
pip install "trillim[voice]"
```

Then start the server with:

```bash
trillim serve Trillim/BitNet-TRNQ --voice
```

Voice endpoints:

- `POST /v1/audio/transcriptions`
- `POST /v1/audio/speech`
- `GET /v1/voices`
- `POST /v1/voices`

Predefined voices are `alba`, `marius`, `javert`, `jean`, `fantine`, `cosette`, `eponine`, and `azelma`.

For custom voice registration through `POST /v1/voices`, accept the terms for [kyutai/pocket-tts](https://huggingface.co/kyutai/pocket-tts), create a HuggingFace token with `Read` access, and run:

```bash
hf auth login
```

Custom voice uploads through `POST /v1/voices` are limited to 8 MB per file.

That setup is only required once. Predefined voices work without it.

## Performance Highlights

Benchmark takeaways for DarkNet on consumer CPUs:

- Prefill throughput improvements are most visible when `num_threads >= 4`.
- Decode throughput is broadly comparable to bitnet.cpp on average, while DarkNet reaches higher peaks.
- Results are directional and depend on thermal behavior, boost policy, and memory bandwidth.

Prefill example:

![Prefill benchmark example](docs/imgs/Q4_0A.png)

Decode example:

![Decode benchmark example](docs/imgs/DecodeA.png)

## Supported Architectures

- `BitnetForCausalLM` for ternary BitNet models with ReLU² activation
- `LlamaForCausalLM` for Llama-style models with SiLU activation

## Platform Support

| Platform | Status |
|----------|--------|
| x86_64 (AVX2) | Supported |
| ARM64 (NEON) | Supported |

Thread count defaults to `num_cores - 2`. Override it with `--threads N`.

## Documentation

- [What Is Trillim?](docs/about-trillim.md)
- Install: [macOS](docs/install-mac.md), [Linux](docs/install-linux.md), [Windows](docs/install-windows.md)
- [CLI Reference](docs/cli.md)
- [Interactive Chat](docs/chat.md)
- [Python Components](docs/components.md)
- [API Server](docs/server.md)
- [Benchmarks](docs/benchmarks.md)

## License

The Trillim Python SDK source code is MIT-licensed. The C++ inference engine binaries (`inference`, `trillim-quantize`) bundled in the pip package are **proprietary**. You may use them as part of Trillim, but may not reverse-engineer or redistribute them separately. See [LICENSE](LICENSE) for the full terms.
