Metadata-Version: 2.4
Name: deathtensors
Version: 0.2.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: pytest>=7 ; extra == 'dev'
Requires-Dist: torch>=1.10 ; extra == 'dev'
Requires-Dist: numpy>=1.20 ; extra == 'dev'
Requires-Dist: numpy>=1.20 ; extra == 'numpy'
Requires-Dist: torch>=1.10 ; extra == 'torch'
Requires-Dist: numpy>=1.20 ; extra == 'torch'
Provides-Extra: dev
Provides-Extra: numpy
Provides-Extra: torch
License-File: LICENSE
Summary: Safe, fast, pickle-free tensor storage for PyTorch. Rust core, Python interface. By Death Legion.
Keywords: pytorch,tensors,safetensors,pickle,machine-learning,deep-learning,model-weights
Home-Page: https://github.com/deathlegion/deathtensors
Author-email: Death Legion <deathlegion@users.noreply.github.com>
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Bug Tracker, https://github.com/deathlegion/deathtensors/issues
Project-URL: Documentation, https://github.com/deathlegion/deathtensors#readme
Project-URL: Homepage, https://github.com/deathlegion/deathtensors
Project-URL: Repository, https://github.com/deathlegion/deathtensors

# deathtensors

**Safe, fast, pickle-free tensor storage for PyTorch.** By **Death Legion**.
Version 0.2.0.

`deathtensors` is a real alternative to `pickle` for storing model weights.
Files use the **`.deathtensors`** extension and are designed to be opened
safely even when they come from an untrusted source: opening a
deathtensors file **never executes arbitrary code**, because the header
is parsed as JSON (no `eval`, no `__reduce__`, no `torch.load`) and the
tensor data blob is treated as opaque bytes.

## What's new in 0.2.0

- **`.deathtensors` is now the canonical file extension** (`.dt` is kept
  as a legacy alias for files written by 0.1.x).
- **zstd compression** of the tensor data blob (`save(..., compress=True)`).
  The reader transparently decompresses; the header stays uncompressed
  so you can still inspect it without decompressing the whole file.
- **Per-tensor statistics**: `f.stats(name)` returns
  `{dtype, shape, nbytes, count, min, max, mean, stddev, nnz}` — computed
  in Rust without materialising the tensor in Python. `f.all_stats()`
  does it for every tensor in one pass.
- **Append mode**: `dt.append(path, {...})` adds tensors to an existing
  file without you having to load the old tensors back into memory.
- **Sharded save**: `dt.save_sharded(path, tensors, max_shard_bytes=...)`
  auto-splits a huge model across multiple `.deathtensors` files with an
  index. `dt.open_sharded(index_path)` reads them back transparently.
- **Zero-copy mmap tensors**: `f.get_tensor_mmap(name)` returns a
  torch.Tensor backed directly by the mmap'd file (no copy).
- **Tensor slicing**: `f.get_tensor_slice(name, start, count)` loads
  only N rows of a huge tensor, for streaming over embedding tables.
- **Sparse tensor support**: `dt.save_sparse(path, {"sp": sparse_coo})`
  and `dt.get_sparse(f, "sp")` round-trip torch sparse COO tensors.
- **Schema validation**: `dt.Schema().expect(...).validate(path)` lets
  you declare expected dtypes/shapes and fail fast on mismatches.
- **`diff()`**: compare two `.deathtensors` files and report added /
  removed / shape-changed / dtype-changed / value-changed tensors.
- **Manifest sidecar**: `dt.write_manifest(path)` writes a small
  JSON file with the full header — handy for browsing on Hugging Face
  Hub without downloading the whole file.
- **CLI**: `python -m deathtensors info|list|verify|stats|convert|diff|manifest`.
- **Path expansion**: `~` and `$ENV_VARS` are expanded in every path.
- **Conversion**: `dt.from_safetensors()` and `dt.to_safetensors()`.

## Install

```bash
pip install deathtensors            # core only
pip install deathtensors[torch]     # pulls in torch + numpy
pip install deathtensors[numpy]     # pulls in numpy
pip install deathtensors[dev]       # torch + numpy + pytest
```

Pre-built wheels are available for CPython 3.8–3.13 on x86_64 Linux.
Other platforms fall back to a source build (requires Rust ≥ 1.74).

## Quickstart

```python
import torch
import deathtensors as dt

# 1. Save with compression + checksum.
tensors = {
    "weight": torch.randn(128, 128),
    "bias":   torch.zeros(128),
}
metadata = {"weight": {"layer": "fc1", "init": "kaiming"}}
global_md = {"model": "mlp-tiny", "license": "MIT"}
dt.save("model.deathtensors", tensors, metadata=metadata,
        global_metadata=global_md, compress=True, checksum=True)

# 2. Open the file lazily — no tensors are read yet.
with dt.open("model.deathtensors", verify=True) as f:
    print(f.keys())                          # ['weight', 'bias']
    print(f.metadata())                      # global metadata dict
    print(f.info("weight"))                  # dtype/shape/offsets/metadata
    print(f.stats("weight"))                 # min/max/mean/stddev/nnz
    w = f.get_tensor("weight")               # only 'weight' is read
    print(w.shape, w.dtype)                  # torch.Size([128, 128]) torch.float32
```

## Sharded save for huge models

```python
import torch, deathtensors as dt

# A 100-tensor model that we want to ship in ~10 shards.
tensors = {f"layer.{i}.weight": torch.randn(500, 500) for i in range(100)}
shards = dt.save_sharded(
    "big_model.deathtensors",
    tensors,
    max_shard_bytes=10 * 1024 * 1024,  # 10 MiB per shard
    compress=True,
    checksum=True,
)
print(f"wrote {len(shards)} shards")

# Read it back transparently.
with dt.open_sharded("big_model.deathtensors") as sr:
    print(sr.which_shard("layer.42.weight"))   # e.g. 'big_model-00005-of-00010.deathtensors'
    w = sr.get_tensor("layer.42.weight")        # only opens that one shard
```

## CLI

```bash
python -m deathtensors info model.deathtensors           # show tensor table
python -m deathtensors list model.deathtensors           # list tensor names
python -m deathtensors verify model.deathtensors         # verify SHA-256 footer
python -m deathtensors stats model.deathtensors          # per-tensor min/max/mean/...
python -m deathtensors convert to-safetensors a.deathtensors b.safetensors
python -m deathtensors convert from-safetensors a.safetensors b.deathtensors --compress
python -m deathtensors diff a.deathtensors b.deathtensors
python -m deathtensors manifest model.deathtensors       # write JSON sidecar
python -m deathtensors --version
```

## File format (v1)

```text
+-----------------------+
| Magic (8 bytes)       |   b"DTLEGION"
+-----------------------+
| Version (4 bytes u32) |   1 (little-endian)
+-----------------------+
| Flags (4 bytes u32)   |   bit0: zstd compression
|                       |   bit1: SHA-256 footer
|                       |   bit2: encryption (reserved)
+-----------------------+
| Header size (8 u64)   |   byte length of JSON header
+-----------------------+
| Header (JSON, UTF-8)  |   see docs/format_spec.md
+-----------------------+
| Padding (0..8 bytes)  |   NUL bytes, 8-byte alignment
+-----------------------+
| Tensor data (blob)    |   raw bytes (or zstd-compressed)
+-----------------------+
| Footer (32 bytes)     |   optional: SHA-256(header + padding + data)
+-----------------------+
```

Full spec: `docs/format_spec.md`.

## Why not just use `pickle` / `torch.save`?

`torch.save` uses `pickle` under the hood, which means **opening a
`.pt` file from an untrusted source can run arbitrary Python code**.
This has been the cause of several real-world supply-chain attacks on
ML model hubs. `deathtensors` files are pure data: a fixed binary
prefix followed by JSON metadata followed by raw tensor bytes. There is
no code path in the reader that calls `eval`, `exec`, `__reduce__`, or
any pickle-style reconstruction.

## Why not just use `safetensors`?

`safetensors` is excellent and we encourage you to use it. `deathtensors`
exists as a separate, independent implementation because:

1. **Format diversity is good for the ecosystem.** A single point of
   failure in any one tensor-storage library would be bad; having two
   interoperable libraries with different code paths reduces risk.
2. **`deathtensors` ships an optional SHA-256 footer for integrity
   verification**, which is useful when files travel through untrusted
   channels.
3. **`deathtensors` ships per-tensor string metadata** in addition to
   global metadata.
4. **`deathtensors` ships optional zstd compression** of the tensor
   data blob.
5. **`deathtensors` ships a built-in CLI** (`python -m deathtensors`).
6. **`deathtensors` ships sharded save/open** out of the box.
7. **`deathtensors` exposes a richer dtype set** including BF16, complex
   64, complex 128, and unsigned 16/32/64-bit integers.
8. **`deathtensors` ships a `Schema` class** for declarative validation.
9. **`deathtensors` ships a `diff()` function** to compare two files.
10. **`deathtensors` ships a manifest sidecar writer** for browsing.

We do not try to be a drop-in replacement. The Python API is similar
in spirit (`save`, `open`, `keys`, `get_tensor`) but the file format
is not compatible — a `.deathtensors` file is not a `.safetensors`
file and vice versa. Use `dt.from_safetensors()` / `dt.to_safetensors()`
to convert.

## Public API

```python
# Core save/open
deathtensors.save(path, tensors, metadata=None, global_metadata=None,
                  checksum=False, compress=False, compress_level=3)
deathtensors.open(path, verify=False)        # context manager
deathtensors.save_file(path, tensors, ...)   # lower-level (raw bytes)
deathtensors.append(path, tensors, metadata=None, extra_global_metadata=None)
deathtensors.DtFile(path, verify=False)      # the class returned by open()

f = deathtensors.open("model.deathtensors")
f.keys()                                      # list of tensor names
f.info(name)                                  # dtype, shape, offsets, nbytes, metadata
f.metadata()                                  # global file metadata
f.get_bytes(name)                             # raw bytes
f.get_buffer(name)                            # memoryview
f.get_tensor(name, framework="torch")         # torch.Tensor (default) or numpy.ndarray
f.get_tensor_mmap(name)                       # zero-copy mmap-backed tensor (uncompressed only)
f.get_tensor_slice(name, start, count, ...)   # load only N rows
f.get_tensors(framework="torch")              # dict of all tensors
f.stats(name)                                 # min/max/mean/stddev/nnz
f.all_stats()                                 # stats for every tensor
f.verify()                                    # verify SHA-256 footer
f.has_checksum()                              # was the file written with checksum=True?
f.is_compressed()                             # was the file written with compress=True?

# Sharded
deathtensors.save_sharded(path, tensors, max_shard_bytes=5*GiB, ...)
deathtensors.open_sharded(index_path, verify=False) -> ShardedReader

# Sparse
deathtensors.save_sparse(path, tensors, ...)
deathtensors.get_sparse(f, name) -> torch.Tensor

# Conversion
deathtensors.from_safetensors(src, dst=None, checksum=True, compress=False)
deathtensors.to_safetensors(src, dst=None)

# Schema validation
schema = deathtensors.Schema()
schema.expect(name, dtype=..., shape=..., metadata_keys=...)
schema.allow_extra()
schema.validate(path_or_file)

# Diff
deathtensors.diff(path_a, path_b) -> dict

# Manifest
deathtensors.write_manifest(dt_path, manifest_path=None)
deathtensors.read_manifest(manifest_path) -> dict

# CLI
python -m deathtensors info|list|verify|stats|convert|diff|manifest <path>
```

## Testing

```bash
pip install deathtensors[dev]
pytest tests/
```

The test suite covers: every dtype round-trip, save/load with and
without compression, save/load with and without checksum, lazy loading,
per-tensor stats correctness, append mode, sharded save/open, sparse
tensors, schema validation (pass and fail), diff (identical / added /
removed / value-changed), manifest round-trip, path expansion, the CLI
(`info`/`list`/`verify`/`stats`/`manifest`/`diff`), backward
compatibility with `.dt` files written by 0.1.x, and safety (no
`eval`/`exec` called on open).

## License

MIT, © Death Legion.

