Metadata-Version: 2.4
Name: fastsafetensors
Version: 0.3.2
Summary: High-performance safetensors model loader
Author-email: Takeshi Yoshimura <tyos@jp.ibm.com>
Maintainer-email: Takeshi Yoshimura <tyos@jp.ibm.com>
License: Apache-2.0
Project-URL: Repository, https://github.com/foundation-model-stack/fastsafetensors
Keywords: fastsafetensors,safetensors,GDS
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: <3.15,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typer>=0.9.0
Provides-Extra: progress
Requires-Dist: tqdm>=4.66.3; extra == "progress"
Provides-Extra: test
Requires-Dist: torch>=2.10.0; extra == "test"
Requires-Dist: pytest>=9.0.3; extra == "test"
Requires-Dist: pytest-cov>=5.0.0; extra == "test"
Requires-Dist: transformers>=5.0.0; extra == "test"
Requires-Dist: safetensors>=0.4.0; extra == "test"
Provides-Extra: dev
Requires-Dist: pre-commit>=3.6.0; extra == "dev"
Requires-Dist: black==26.3.1; extra == "dev"
Requires-Dist: isort==6.1.0; extra == "dev"
Requires-Dist: flake8==7.3.0; extra == "dev"
Requires-Dist: mypy==1.19.1; extra == "dev"
Provides-Extra: threefs
Requires-Dist: fastsafetensor-3fs-reader>=0.3.3; extra == "threefs"
Dynamic: license-file

fastsafetensors
================

fastsafetensors is an efficient safetensors loader. If you develop your own code that loads large safetensors files, you can try fastsafetensors APIs (see [docs](./docs/overview.md)). For example, vLLM and SGLang have `--load-format fastsafetensors` command-line argument to speed up their initialization.

This library supports Linux/CUDA, ROCm without GDS, Windows, [3FS](https://github.com/deepseek-ai/3fs), unified-memory systems such as DGX Spark, and so on. We welcome more platform/storage-specific optimizations like them by adding new [copier backends](fastsafetensors/copier/). Our CI tests Python 3.10-3.14 with PyTorch 2.11.0.

# Performance Highlights

Performance highlights from the [CLOUD 2025 paper](https://arxiv.org/abs/2505.23072) and benchmark docs:
- Standalone model loading was **4.8x-7.5x faster** than the default `safetensors` deserializer on Llama, Falcon, and Bloom models, and reached **26.4 GB/s** NVMe read throughput for Llama-70B on four GPUs with GDS.
- In the paper's vLLM integration experiment, startup time dropped from **12.39s to 4.74s** for Llama-2-13B on 4x L40S GPUs, and from **16.04s to 6.88s** on 1x A100.
- On AMD ROCm without GDS, the documented `nogds` path reached **6.02 GB/s** for GPT-2 Medium versus **1.28 GB/s** with `mmap` (**4.7x** throughput), and **2.62 GB/s** for GPT-2 versus **1.01 GB/s** with `mmap` (**2.6x** throughput). See the [report](./docs/amd-perf.md) for more details.

# Quick Start

```bash
pip install fastsafetensors
pip install vllm # for quick demo
vllm serve Qwen/Qwen3-0.6B --load-format fastsafetensors
...
Loading safetensors using Fastsafetensor loader:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors using Fastsafetensor loader: 100% Completed | 1/1 [00:00<00:00,  1.23it/s]
```

# Design Details

See [Overview](./docs/overview.md) for features, basic API usage, and configuration.

# Code of Conduct

Please refer to [Foundation Model Stack Community Code of Conduct](https://github.com/foundation-model-stack/foundation-model-stack/blob/main/code-of-conduct.md).

# Development

See [Development](./docs/development.md).

# Publication

Takeshi Yoshimura, Tatsuhiro Chiba, Manish Sethi, Daniel Waddington, Swaminathan Sundararaman. (2025) Speeding up Model Loading with fastsafetensors [arXiv:2505.23072](https://arxiv.org/abs/2505.23072) and IEEE CLOUD 2025.
