Metadata-Version: 2.4
Name: deathtensors
Version: 0.3.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: pytest>=7 ; extra == 'dev'
Requires-Dist: torch>=1.10 ; extra == 'dev'
Requires-Dist: numpy>=1.20 ; extra == 'dev'
Requires-Dist: huggingface-hub>=0.20 ; extra == 'dev'
Requires-Dist: huggingface-hub>=0.20 ; extra == 'huggingface'
Requires-Dist: numpy>=1.20 ; extra == 'numpy'
Requires-Dist: torch>=1.10 ; extra == 'torch'
Requires-Dist: numpy>=1.20 ; extra == 'torch'
Provides-Extra: dev
Provides-Extra: huggingface
Provides-Extra: numpy
Provides-Extra: torch
License-File: LICENSE
Summary: Safe, fast, pickle-free tensor storage for PyTorch. Rust core, Python interface. By Death Legion.
Keywords: pytorch,tensors,safetensors,pickle,machine-learning,deep-learning,model-weights
Home-Page: https://github.com/deathlegion/deathtensors
Author-email: Death Legion <deathlegion@users.noreply.github.com>
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Bug Tracker, https://github.com/deathlegion/deathtensors/issues
Project-URL: Documentation, https://github.com/deathlegion/deathtensors#readme
Project-URL: Homepage, https://github.com/deathlegion/deathtensors
Project-URL: Repository, https://github.com/deathlegion/deathtensors

# deathtensors

**Safe, fast, pickle-free tensor storage for PyTorch.** By **Death Legion**.
Version 0.3.0.

`deathtensors` is a real alternative to `pickle` for storing model weights.
Files use the **`.deathtensors`** extension and are designed to be opened
safely even when they come from an untrusted source: opening a
deathtensors file **never executes arbitrary code**, because the header
is parsed as JSON (no `eval`, no `__reduce__`, no `torch.load`) and the
tensor data blob is treated as opaque bytes.

## What's new in 0.3.0

- **Hugging Face Hub integration** — push and pull `.deathtensors` files
  directly to/from the HF Hub (single-file and sharded), list
  `.deathtensors` files in any repo, fetch repo metadata, and convert
  `.safetensors` files to `.deathtensors` in-place on the Hub. CLI:
  `python -m deathtensors hub-push|hub-pull|hub-list|hub-info|hub-convert`.
- Optional `huggingface` extra: `pip install deathtensors[huggingface]`.

## What was new in 0.2.0

- **`.deathtensors` is now the canonical file extension** (`.dt` is kept
  as a legacy alias for files written by 0.1.x).
- **zstd compression** of the tensor data blob (`save(..., compress=True)`).
  The reader transparently decompresses; the header stays uncompressed
  so you can still inspect it without decompressing the whole file.
- **Per-tensor statistics**: `f.stats(name)` returns
  `{dtype, shape, nbytes, count, min, max, mean, stddev, nnz}` — computed
  in Rust without materialising the tensor in Python. `f.all_stats()`
  does it for every tensor in one pass.
- **Append mode**: `dt.append(path, {...})` adds tensors to an existing
  file without you having to load the old tensors back into memory.
- **Sharded save**: `dt.save_sharded(path, tensors, max_shard_bytes=...)`
  auto-splits a huge model across multiple `.deathtensors` files with an
  index. `dt.open_sharded(index_path)` reads them back transparently.
- **Zero-copy mmap tensors**: `f.get_tensor_mmap(name)` returns a
  torch.Tensor backed directly by the mmap'd file (no copy).
- **Tensor slicing**: `f.get_tensor_slice(name, start, count)` loads
  only N rows of a huge tensor, for streaming over embedding tables.
- **Sparse tensor support**: `dt.save_sparse(path, {"sp": sparse_coo})`
  and `dt.get_sparse(f, "sp")` round-trip torch sparse COO tensors.
- **Schema validation**: `dt.Schema().expect(...).validate(path)` lets
  you declare expected dtypes/shapes and fail fast on mismatches.
- **`diff()`**: compare two `.deathtensors` files and report added /
  removed / shape-changed / dtype-changed / value-changed tensors.
- **Manifest sidecar**: `dt.write_manifest(path)` writes a small
  JSON file with the full header — handy for browsing on Hugging Face
  Hub without downloading the whole file.
- **CLI**: `python -m deathtensors info|list|verify|stats|convert|diff|manifest`.
- **Path expansion**: `~` and `$ENV_VARS` are expanded in every path.
- **Conversion**: `dt.from_safetensors()` and `dt.to_safetensors()`.

## Install

```bash
pip install deathtensors            # core only
pip install deathtensors[torch]     # pulls in torch + numpy
pip install deathtensors[numpy]     # pulls in numpy
pip install deathtensors[dev]       # torch + numpy + pytest
```

Pre-built wheels are available for CPython 3.8–3.13 on x86_64 Linux.
Other platforms fall back to a source build (requires Rust ≥ 1.74).

## Quickstart

```python
import torch
import deathtensors as dt

# 1. Save with compression + checksum.
tensors = {
    "weight": torch.randn(128, 128),
    "bias":   torch.zeros(128),
}
metadata = {"weight": {"layer": "fc1", "init": "kaiming"}}
global_md = {"model": "mlp-tiny", "license": "MIT"}
dt.save("model.deathtensors", tensors, metadata=metadata,
        global_metadata=global_md, compress=True, checksum=True)

# 2. Open the file lazily — no tensors are read yet.
with dt.open("model.deathtensors", verify=True) as f:
    print(f.keys())                          # ['weight', 'bias']
    print(f.metadata())                      # global metadata dict
    print(f.info("weight"))                  # dtype/shape/offsets/metadata
    print(f.stats("weight"))                 # min/max/mean/stddev/nnz
    w = f.get_tensor("weight")               # only 'weight' is read
    print(w.shape, w.dtype)                  # torch.Size([128, 128]) torch.float32
```

## Sharded save for huge models

```python
import torch, deathtensors as dt

# A 100-tensor model that we want to ship in ~10 shards.
tensors = {f"layer.{i}.weight": torch.randn(500, 500) for i in range(100)}
shards = dt.save_sharded(
    "big_model.deathtensors",
    tensors,
    max_shard_bytes=10 * 1024 * 1024,  # 10 MiB per shard
    compress=True,
    checksum=True,
)
print(f"wrote {len(shards)} shards")

# Read it back transparently.
with dt.open_sharded("big_model.deathtensors") as sr:
    print(sr.which_shard("layer.42.weight"))   # e.g. 'big_model-00005-of-00010.deathtensors'
    w = sr.get_tensor("layer.42.weight")        # only opens that one shard
```

## CLI

```bash
python -m deathtensors info model.deathtensors           # show tensor table
python -m deathtensors list model.deathtensors           # list tensor names
python -m deathtensors verify model.deathtensors         # verify SHA-256 footer
python -m deathtensors stats model.deathtensors          # per-tensor min/max/mean/...
python -m deathtensors convert to-safetensors a.deathtensors b.safetensors
python -m deathtensors convert from-safetensors a.safetensors b.deathtensors --compress
python -m deathtensors diff a.deathtensors b.deathtensors
python -m deathtensors manifest model.deathtensors       # write JSON sidecar
python -m deathtensors --version

# Hugging Face Hub
python -m deathtensors hub-push death-legion/my-model model.deathtensors
python -m deathtensors hub-pull death-legion/my-model model.deathtensors
python -m deathtensors hub-list death-legion/my-model
python -m deathtensors hub-info death-legion/my-model
python -m deathtensors hub-convert death-legion/my-model model.safetensors --compress
```

## Hugging Face Hub

```bash
pip install deathtensors[huggingface]
```

```python
import torch, deathtensors as dt
from deathtensors.huggingface import (
    push_to_hub, load_from_hub,
    push_sharded_to_hub, load_sharded_from_hub,
    list_repo_models, hub_model_info,
    convert_safetensors_on_hub, HubCachedFile,
)

# 1. Save locally with compression + checksum.
model = {"weight": torch.randn(128, 128), "bias": torch.zeros(128)}
dt.save("model.deathtensors", model, compress=True, checksum=True)

# 2. Push to a HF repo (creates the repo if it doesn't exist).
url = push_to_hub(
    repo_id="death-legion/my-model",
    local_path="model.deathtensors",
    path_in_repo="model.deathtensors",
    # token=None picks up HF_TOKEN env var or `huggingface-cli login`.
)
print(url)  # https://huggingface.co/death-legion/my-model/blob/main/model.deathtensors

# 3. Read it back lazily — HF cache (~/.cache/huggingface/hub) dedupes
#    downloads across processes.
with load_from_hub("death-legion/my-model", "model.deathtensors", verify=True) as f:
    w = f.get_tensor("weight")   # only 'weight' is read; 'bias' is not

# 4. Sharded models work too.
shards = dt.save_sharded("big_model.deathtensors", huge_state_dict,
                         max_shard_bytes=5 * 1024**3)  # 5 GiB per shard
push_sharded_to_hub("death-legion/big-model", "big_model.deathtensors")

# Read shards lazily — only the shard containing the tensor you ask for
# is downloaded.
with load_sharded_from_hub("death-legion/big-model", "big_model.deathtensors") as sr:
    w = sr.get_tensor("layer.42.weight")
    print(sr.which_shard("layer.42.weight"))  # e.g. 'big_model-00005-of-00010.deathtensors'

# 5. Convert an existing .safetensors file on the Hub to .deathtensors,
#    in-place, without downloading it locally first.
convert_safetensors_on_hub(
    repo_id="death-legion/old-checkpoint",
    safetensors_filename="model.safetensors",
    compress=True,
    delete_original=False,  # set True to also delete the .safetensors file
)

# 6. List every .deathtensors file in a repo:
for f in list_repo_models("death-legion/my-model"):
    print(f)

# 7. Get repo metadata:
info = hub_model_info("death-legion/my-model")
print(info["downloads"], info["likes"], info["deathtensors_files"])
```

The token is resolved in this order: the `token=` argument, the
`HF_TOKEN` env var, the `HUGGING_FACE_HUB_TOKEN` env var, then
`huggingface-cli login` credentials.

## File format (v1)

```text
+-----------------------+
| Magic (8 bytes)       |   b"DTLEGION"
+-----------------------+
| Version (4 bytes u32) |   1 (little-endian)
+-----------------------+
| Flags (4 bytes u32)   |   bit0: zstd compression
|                       |   bit1: SHA-256 footer
|                       |   bit2: encryption (reserved)
+-----------------------+
| Header size (8 u64)   |   byte length of JSON header
+-----------------------+
| Header (JSON, UTF-8)  |   see docs/format_spec.md
+-----------------------+
| Padding (0..8 bytes)  |   NUL bytes, 8-byte alignment
+-----------------------+
| Tensor data (blob)    |   raw bytes (or zstd-compressed)
+-----------------------+
| Footer (32 bytes)     |   optional: SHA-256(header + padding + data)
+-----------------------+
```

Full spec: `docs/format_spec.md`.

## Why not just use `pickle` / `torch.save`?

`torch.save` uses `pickle` under the hood, which means **opening a
`.pt` file from an untrusted source can run arbitrary Python code**.
This has been the cause of several real-world supply-chain attacks on
ML model hubs. `deathtensors` files are pure data: a fixed binary
prefix followed by JSON metadata followed by raw tensor bytes. There is
no code path in the reader that calls `eval`, `exec`, `__reduce__`, or
any pickle-style reconstruction.

## Why not just use `safetensors`?

`safetensors` is excellent and we encourage you to use it. `deathtensors`
exists as a separate, independent implementation because:

1. **Format diversity is good for the ecosystem.** A single point of
   failure in any one tensor-storage library would be bad; having two
   interoperable libraries with different code paths reduces risk.
2. **`deathtensors` ships an optional SHA-256 footer for integrity
   verification**, which is useful when files travel through untrusted
   channels.
3. **`deathtensors` ships per-tensor string metadata** in addition to
   global metadata.
4. **`deathtensors` ships optional zstd compression** of the tensor
   data blob.
5. **`deathtensors` ships a built-in CLI** (`python -m deathtensors`).
6. **`deathtensors` ships sharded save/open** out of the box.
7. **`deathtensors` exposes a richer dtype set** including BF16, complex
   64, complex 128, and unsigned 16/32/64-bit integers.
8. **`deathtensors` ships a `Schema` class** for declarative validation.
9. **`deathtensors` ships a `diff()` function** to compare two files.
10. **`deathtensors` ships a manifest sidecar writer** for browsing.

We do not try to be a drop-in replacement. The Python API is similar
in spirit (`save`, `open`, `keys`, `get_tensor`) but the file format
is not compatible — a `.deathtensors` file is not a `.safetensors`
file and vice versa. Use `dt.from_safetensors()` / `dt.to_safetensors()`
to convert.

## Public API

```python
# Core save/open
deathtensors.save(path, tensors, metadata=None, global_metadata=None,
                  checksum=False, compress=False, compress_level=3)
deathtensors.open(path, verify=False)        # context manager
deathtensors.save_file(path, tensors, ...)   # lower-level (raw bytes)
deathtensors.append(path, tensors, metadata=None, extra_global_metadata=None)
deathtensors.DtFile(path, verify=False)      # the class returned by open()

f = deathtensors.open("model.deathtensors")
f.keys()                                      # list of tensor names
f.info(name)                                  # dtype, shape, offsets, nbytes, metadata
f.metadata()                                  # global file metadata
f.get_bytes(name)                             # raw bytes
f.get_buffer(name)                            # memoryview
f.get_tensor(name, framework="torch")         # torch.Tensor (default) or numpy.ndarray
f.get_tensor_mmap(name)                       # zero-copy mmap-backed tensor (uncompressed only)
f.get_tensor_slice(name, start, count, ...)   # load only N rows
f.get_tensors(framework="torch")              # dict of all tensors
f.stats(name)                                 # min/max/mean/stddev/nnz
f.all_stats()                                 # stats for every tensor
f.verify()                                    # verify SHA-256 footer
f.has_checksum()                              # was the file written with checksum=True?
f.is_compressed()                             # was the file written with compress=True?

# Sharded
deathtensors.save_sharded(path, tensors, max_shard_bytes=5*GiB, ...)
deathtensors.open_sharded(index_path, verify=False) -> ShardedReader

# Sparse
deathtensors.save_sparse(path, tensors, ...)
deathtensors.get_sparse(f, name) -> torch.Tensor

# Conversion
deathtensors.from_safetensors(src, dst=None, checksum=True, compress=False)
deathtensors.to_safetensors(src, dst=None)

# Schema validation
schema = deathtensors.Schema()
schema.expect(name, dtype=..., shape=..., metadata_keys=...)
schema.allow_extra()
schema.validate(path_or_file)

# Diff
deathtensors.diff(path_a, path_b) -> dict

# Manifest
deathtensors.write_manifest(dt_path, manifest_path=None)
deathtensors.read_manifest(manifest_path) -> dict

# CLI
python -m deathtensors info|list|verify|stats|convert|diff|manifest <path>
```

## Testing

```bash
pip install deathtensors[dev]
pytest tests/
```

The test suite covers: every dtype round-trip, save/load with and
without compression, save/load with and without checksum, lazy loading,
per-tensor stats correctness, append mode, sharded save/open, sparse
tensors, schema validation (pass and fail), diff (identical / added /
removed / value-changed), manifest round-trip, path expansion, the CLI
(`info`/`list`/`verify`/`stats`/`manifest`/`diff`), backward
compatibility with `.dt` files written by 0.1.x, and safety (no
`eval`/`exec` called on open).

## License

MIT, © Death Legion.

