Metadata-Version: 2.1
Name: tilert
Version: 0.1.4
Summary: TileRT: Tile-Based Runtime for Ultra-Low-Latency LLM Inference.
Author-Email: TileRT-team <tile-ai@outlook.com>
License: MIT
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries
Project-URL: Homepage, https://github.com/tile-ai/TileRT
Project-URL: Issues, https://github.com/tile-ai/TileRT/issues
Requires-Python: >=3.11
Requires-Dist: einops
Requires-Dist: numpy
Requires-Dist: scipy
Requires-Dist: torch==2.11.0
Requires-Dist: transformers==4.46.3
Provides-Extra: lint
Requires-Dist: black==25.1.0; extra == "lint"
Requires-Dist: isort==6.0.1; extra == "lint"
Requires-Dist: flake8==7.1.2; extra == "lint"
Requires-Dist: flake8-bugbear==24.12.12; extra == "lint"
Requires-Dist: flake8-comprehensions==3.16.0; extra == "lint"
Requires-Dist: flake8-docstrings==1.7.0; extra == "lint"
Requires-Dist: flake8-simplify==0.21.0; extra == "lint"
Requires-Dist: flake8-unused-arguments==0.0.13; extra == "lint"
Requires-Dist: flake8-variables-names==0.0.6; extra == "lint"
Requires-Dist: flake8-return==1.2.0; extra == "lint"
Requires-Dist: flake8-print==5.0.0; extra == "lint"
Requires-Dist: mypy==1.15.0; extra == "lint"
Requires-Dist: tomli==2.2.1; extra == "lint"
Requires-Dist: bandit==1.8.3; extra == "lint"
Requires-Dist: pyupgrade==3.19.1; extra == "lint"
Requires-Dist: clang-format==18.1.5; extra == "lint"
Requires-Dist: codespell==2.4.1; extra == "lint"
Requires-Dist: pre-commit>=3.0.0; extra == "lint"
Requires-Dist: types-setuptools; extra == "lint"
Requires-Dist: types-requests; extra == "lint"
Requires-Dist: types-urllib3; extra == "lint"
Requires-Dist: types-six; extra == "lint"
Requires-Dist: mdformat==0.7.17; extra == "lint"
Requires-Dist: mdformat-gfm==0.4.1; extra == "lint"
Requires-Dist: mdformat-frontmatter==2.0.8; extra == "lint"
Requires-Dist: mdformat-myst==0.2.1; extra == "lint"
Requires-Dist: mdformat-tables==1.0.0; extra == "lint"
Requires-Dist: mdformat-toc==0.3.0; extra == "lint"
Requires-Dist: mdformat-black==0.1.1; extra == "lint"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Requires-Dist: pytest-xdist>=3.0.0; extra == "test"
Requires-Dist: scipy; extra == "test"
Provides-Extra: dev
Requires-Dist: tilert[lint]; extra == "dev"
Requires-Dist: tilert[test]; extra == "dev"
Requires-Dist: commitizen==4.4.1; extra == "dev"
Requires-Dist: openpyxl==3.1.5; extra == "dev"
Requires-Dist: setuptools-scm[toml]; extra == "dev"
Description-Content-Type: text/markdown

<div align="center">
  <img src="https://raw.githubusercontent.com/tile-ai/tilert/main/assets/logo.png" width="120"/>
  <h1>TileRT: Tile-Based Runtime for<br>Ultra-Low-Latency LLM Inference</h1>
  <p>
    <a href="https://github.com/tile-ai/TileRT">
      <img src="https://img.shields.io/badge/GitHub-Repo-1E90FF?logo=github&logoColor=white" alt="GitHub repository" height="20">
    </a>
    <a href="https://pypi.org/project/tilert/"><img src="https://img.shields.io/badge/PyPI-tilert-1E90FF" alt="PyPI version" height="20"></a>
    <a href="https://huggingface.co/Tile-AI/DeepSeek-V3.2-Exp-TileRT"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-1E90FF" alt="HuggingFace" height="20"></a>
  </p>
</div>

**TileRT** serves large language models (LLMs) in ultra-low-latency scenarios — pushing the latency limits of hundred-billion-parameter models to millisecond-level time per output token (TPOT) without compromising model size or quality. Its tile-level runtime engine decomposes LLM operators into fine-grained tile tasks and dynamically overlaps computation, I/O, and communication across multiple GPUs.

The current preview supports **DeepSeek-V3.2** and **GLM-5** on **8× NVIDIA B200**. For full usage, examples, and news, see the [GitHub repository](https://github.com/tile-ai/TileRT).

<p align="center">
  <img src="https://raw.githubusercontent.com/tile-ai/tilert/main/assets/glm5_tilert_mtp.png" width="640" alt="GLM-5.1-FP8 token generation with TileRT v0.1.4"/>
  <br/>
  <sub><em>GLM-5.1-FP8 token generation speed with TileRT v0.1.4. Output length 1K, input length 1K–192K. Bars compare TileRT without MTP, with MTP at average acceptance length 3.2, and the peak under best-case MTP acceptance.</em></sub>
</p>

## Installation

The official `tilert==0.1.4` wheel on PyPI was compiled against the following stack. Treat these as **hard requirements**, not lower bounds.

| Component        | Pinned version                                      |
| ---------------- | --------------------------------------------------- |
| NVIDIA driver    | Supports **CUDA 13.2** runtime                      |
| Operating System | Linux **x86_64**, glibc **≥ 2.28** (manylinux_2_28) |
| Python           | **3.12**                                            |
| PyTorch          | **`torch==2.11.0+cu130`**                           |
| `transformers`   | **`4.46.3`**                                        |
| `tokenizers`     | **`0.20.3`**                                        |

### Recommended: pre-built Docker image

The pinned environment is preinstalled in our official image — the **recommended way to run TileRT**, avoiding version drift on the host. The image is mirrored to two registries; pull from whichever is reachable:

```bash
docker pull ghcr.io/tile-ai/tilert:cu132-latest   # GitHub Container Registry
docker pull tileai/tilert:cu132-latest            # Docker Hub
```

Launch a container with all 8 GPUs attached, then install the wheel inside:

```bash
docker run --rm -it --gpus all --ipc=host \
    -v "$PWD":/workspace -w /workspace \
    ghcr.io/tile-ai/tilert:cu132-latest

# Install from PyPI:
pip install tilert==0.1.4

# Or pin the exact wheel from the GitHub Release page (same artifact,
# useful when PyPI is unreachable):
pip install https://github.com/tile-ai/TileRT/releases/download/v0.1.4/tilert-0.1.4-cp312-cp312-manylinux_2_28_x86_64.whl
```

Verify the install:

```bash
python -c "import tilert, torch; print('tilert', tilert.__version__, '/ torch', torch.__version__, '/ cuda', torch.version.cuda)"
# Expected: tilert 0.1.4 / torch 2.11.0+cu130 / cuda 13.0
```

## Documentation

For weight conversion, the generation CLI, the programmatic API, Multi-Token Prediction (MTP), and the latest benchmarks, see the [**TileRT GitHub repository**](https://github.com/tile-ai/TileRT).
