Metadata-Version: 2.4
Name: deathtensors
Version: 0.1.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: pytest>=7 ; extra == 'dev'
Requires-Dist: torch>=1.10 ; extra == 'dev'
Requires-Dist: numpy>=1.20 ; extra == 'dev'
Requires-Dist: numpy>=1.20 ; extra == 'numpy'
Requires-Dist: torch>=1.10 ; extra == 'torch'
Requires-Dist: numpy>=1.20 ; extra == 'torch'
Provides-Extra: dev
Provides-Extra: numpy
Provides-Extra: torch
License-File: LICENSE
Summary: Safe, fast, pickle-free tensor storage for PyTorch. Rust core, Python interface. By Death Legion.
Keywords: pytorch,tensors,safetensors,pickle,machine-learning,deep-learning,model-weights
Home-Page: https://github.com/deathlegion/deathtensors
Author-email: Death Legion <deathlegion@users.noreply.github.com>
License: MIT
Requires-Python: >=3.8
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Bug Tracker, https://github.com/deathlegion/deathtensors/issues
Project-URL: Documentation, https://github.com/deathlegion/deathtensors#readme
Project-URL: Homepage, https://github.com/deathlegion/deathtensors
Project-URL: Repository, https://github.com/deathlegion/deathtensors

# deathtensors

**Safe, fast, pickle-free tensor storage for PyTorch.** By **Death Legion**.

`deathtensors` is a real alternative to `pickle` for storing model weights.
Files are designed to be opened safely even when they come from an
untrusted source: opening a `deathtensors` file **never executes
arbitrary code**, because the header is parsed as JSON (no `eval`, no
`__reduce__`, no `torch.load`) and the tensor data blob is treated as
opaque bytes.

- **Rust core, Python interface.** The file format and I/O are
  implemented in Rust; PyO3 bindings expose a Pythonic API.
- **Lazy loading.** Open a file in O(1), list tensors, read metadata,
  and load only the tensors you need — without pulling the whole file
  into memory. The reader is memory-mapped, so even loading a tensor
  page-faults only the bytes you actually touch.
- **15 dtypes.** BOOL, U8/I8/U16/I16/F16/BF16/I32/U32/F32/I64/U64/F64,
  and complex C64/C128.
- **Per-tensor and global string metadata.** Tag each tensor with
  layer names, training provenance, licenses, etc., and tag the file
  itself with model name, framework version, etc.
- **Optional SHA-256 footer.** Verify file integrity with
  `deathtensors.open(path, verify=True)`.
- **Atomic writes.** `save()` writes to a temp file and renames into
  place — a crash never leaves a half-written file visible to readers.
- **Deterministic output.** Tensor insertion order is preserved, so two
  saves of the same dict produce byte-identical files (useful for
  reproducible research builds and git-tracked weights).

## Install

```bash
pip install deathtensors            # core only
pip install deathtensors[torch]     # pulls in torch
pip install deathtensors[numpy]     # pulls in numpy
pip install deathtensors[dev]       # torch + numpy + pytest
```

Pre-built wheels are available for CPython 3.8–3.13 on x86_64 Linux.
Other platforms fall back to a source build (requires Rust ≥ 1.74).

## Quickstart

```python
import torch
import deathtensors as dt

# 1. Save a couple of tensors to one file.
tensors = {
    "weight": torch.randn(128, 128),
    "bias":   torch.zeros(128),
}
metadata = {
    "weight": {"layer": "fc1", "init": "kaiming"},
}
global_md = {"model": "mlp-tiny", "license": "MIT"}
dt.save("model.dt", tensors, metadata=metadata,
        global_metadata=global_md, checksum=True)

# 2. Open the file lazily — no tensors are read yet.
with dt.open("model.dt", verify=True) as f:
    print(f.keys())                          # ['weight', 'bias']
    print(f.metadata())                      # {'model': 'mlp-tiny', ...}
    print(f.info("weight"))                  # full dtype/shape/offsets/metadata
    w = f.get_tensor("weight")               # only 'weight' is read
    print(w.shape, w.dtype)                  # torch.Size([128, 128]) torch.float32
```

## File format (v1)

```text
+-----------------------+
| Magic (8 bytes)       |   b"DTLEGION"
+-----------------------+
| Version (4 bytes u32) |   1 (little-endian)
+-----------------------+
| Flags (4 bytes u32)   |   bit0: zstd (reserved)
|                       |   bit1: SHA-256 footer
|                       |   bit2: encryption (reserved)
+-----------------------+
| Header size (8 u64)   |   byte length of JSON header
+-----------------------+
| Header (JSON, UTF-8)  |   see below
+-----------------------+
| Padding (0..8 bytes)  |   NUL bytes, 8-byte alignment
+-----------------------+
| Tensor data (blob)    |   raw bytes, offsets are relative to here
+-----------------------+
| Footer (32 bytes)     |   optional: SHA-256(header + padding + data)
+-----------------------+
```

Header JSON schema:

```json
{
  "format": "deathtensors",
  "format_version": 1,
  "created_by": "deathtensors 0.1.0 (death legion)",
  "global_metadata": {"model": "mlp-tiny", "license": "MIT"},
  "tensors": {
    "weight": {
      "dtype": "F32",
      "shape": [128, 128],
      "data_offsets": [0, 65536],
      "metadata": {"layer": "fc1", "init": "kaiming"}
    },
    "bias": {
      "dtype": "F32",
      "shape": [128],
      "data_offsets": [65536, 65536 + 512],
      "metadata": {}
    }
  }
}
```

`data_offsets` are `[start, end)` byte offsets relative to the start of
the tensor data blob (after the alignment padding), not to the start of
the file. This lets the reader memory-map the blob and slice tensors
out without translating offsets.

## Why not just use `pickle` / `torch.save`?

`torch.save` uses `pickle` under the hood, which means **opening a
`.pt` file from an untrusted source can run arbitrary Python code**.
This has been the cause of several real-world supply-chain attacks on
ML model hubs. `deathtensors` files are pure data: a fixed binary
prefix followed by JSON metadata followed by raw tensor bytes. There is
no code path in the reader that calls `eval`, `exec`, `__reduce__`, or
any pickle-style reconstruction.

## Why not just use `safetensors`?

`safetensors` is excellent and we encourage you to use it. `deathtensors`
exists as a separate, independent implementation because:

1. **Format diversity is good for the ecosystem.** A single point of
   failure in any one tensor-storage library would be bad; having two
   interoperable libraries with different code paths reduces risk.
2. **`deathtensors` ships an optional SHA-256 footer for integrity
   verification**, which is useful when files travel through untrusted
   channels.
3. **`deathtensors` ships per-tensor string metadata** in addition to
   global metadata, which `safetensors` only added later.
4. **`deathtensors` exposes a richer dtype set out of the box**,
   including BF16, complex64, complex128, and unsigned 16/32/64-bit
   integers.

We do not try to be a drop-in replacement. The Python API is similar
in spirit (`save`, `open`, `keys`, `get_tensor`) but the file format
is not compatible — a `.dt` file is not a `.safetensors` file and
vice versa.

## Public API

```python
deathtensors.save(path, tensors, metadata=None, global_metadata=None, checksum=False)
deathtensors.open(path, verify=False)  # context manager
deathtensors.save_file(path, tensors, global_metadata=None, checksum=False)  # lower-level
deathtensors.DtFile(path, verify=False)  # the class returned by open()

f = deathtensors.open("model.dt")
f.keys()                   # list of tensor names
f.info(name)               # dict: dtype, shape, data_offsets, nbytes, metadata
f.metadata()               # global file metadata dict
f.get_bytes(name)          # raw bytes
f.get_buffer(name)         # memoryview of raw bytes
f.get_tensor(name, framework="torch")   # torch.Tensor (default) or numpy.ndarray
f.get_tensors(framework="torch")        # dict of all tensors
f.verify()                 # verify SHA-256 footer (if any); returns bool
f.has_checksum()           # was the file written with checksum=True?
```

## Testing

```bash
pip install deathtensors[dev]
pytest tests/
```

## License

MIT, © Death Legion.

