Metadata-Version: 2.4
Name: zpaq
Version: 0.2.7
Summary: Pure in-memory ZPAQ compression for Python (real pybind11 bindings, prebuilt wheels, no C++ toolchain needed to install).
Home-page: https://github.com/zen-ham/zpaq
Author: zh
Author-email: imnotgivingmyemailjustaddmeondiscordmydiscordisz_h_@zh.com
Project-URL: Bug Tracker, https://github.com/zen-ham/zpaq/issues
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: C++
Classifier: License :: Public Domain
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: MacOS
Classifier: Topic :: System :: Archiving :: Compression
Requires-Python: >=3.8
Description-Content-Type: text/markdown

`zpaq`
===

Pure in-memory [ZPAQ](http://mattmahoney.net/dc/zpaq.html) compression for Python — **up to 5.9× faster than the official `zpaq` CLI** at the same ratio (with optional fragment-level deduplication), byte-exact CLI-interoperable in both directions, multi-threaded compress + decompress, prebuilt wheels for every modern Python on Windows / Linux / macOS, zero C++ toolchain or runtime dependencies to install.

```py
import zpaq

blob = zpaq.compress(b"hello world " * 1_000, level=3)   # bytes -> bytes
assert zpaq.decompress(blob) == b"hello world " * 1_000
```

By default `zpaq.compress(...)` auto-scales across all CPU cores (`threads=0`). If you want the absolute best compression ratio (around 0.5-3 percentage points better) at the cost of throughput, pass `threads=1` to keep the input as a single block:

```py
blob = zpaq.compress(big_data, level=5, threads=1)   # max ratio, single-thread
```

For inputs with repeated content (logs, large text corpora, similar binaries, snapshots), pass `dedup=True` to get fragment-level deduplication — input is split into ~64 KB content-defined chunks, identical chunks are stored once. Matches what the official `zpaq a` CLI produces, so the output is fully `zpaq x` extractable:

```py
blob = zpaq.compress(repetitive_data, level=5, dedup=True)   # JIDAC archive
```

Why this exists
---

Every other ZPAQ binding on PyPI shells out to the `zpaq` executable, which forces temp files and a subprocess fork. This package is both:

- A real pybind11 binding around `libzpaq` (Matt Mahoney's underlying C++ library — the same one the official `zpaq` CLI is built on top of), wrapping abstract `Reader`/`Writer` adapters that read from and write to `bytes` objects with no filesystem detour.
- Distributed as **prebuilt wheels** for Windows, Linux, and macOS (including Apple Silicon) across Python 3.8 through 3.13. Installing it never compiles anything.

On Windows, the wheel statically links the C and C++ runtimes so users don't need any "Visual C++ Redistributable" installed — if Python runs, `zpaq` works.

Performance
---

![benchmark](docs/benchmark.png)

Speedup vs the official `zpaq.exe -m5` (Ryzen-class 12-core x86_64, level 5). Both compress and decompress are now parallel block-wise; `mem` wins both directions at every size from ~1 MB up:

| workload | CLI comp | best mem comp | CLI decomp | best mem decomp |
| --- | --- | --- | --- | --- |
| 40 KB text | 0.14 s | 0.13 s (**1.1×**) | 0.14 s | 0.12 s (**1.2×**) |
| 1 MB text | 2.28 s | 0.58 s (**3.9×**) | 2.23 s | 0.58 s (**3.8×**) |
| 10 MB text | 24.1 s | 3.83 s (**6.3×**) | 25.1 s | 3.83 s (**6.6×**) |
| 100 MB text | 252.5 s | 74.1 s (**3.4×**) | 250.7 s | 73.2 s (**3.4×**) |

Full benchmark by thread count below. CLI is the official `zpaq.exe` v7.15 invoked with `-m5` (the speeds shown include its `-t0` default of two worker threads). `mem(t=N)` is `zpaq.compress(data, level=5, threads=N)`. Times in seconds; ratio is bytes-reduced over original.

40 KB text:

| algo | compress | decompress | ratio % |
| --- | --- | --- | --- |
| `zpaq.exe -m5` | 0.14 s | 0.14 s | 71.5 % |
| `zpaq.compress(t=1)` | 0.13 s | 0.13 s | **73.5 %** |
| `zpaq.compress(t=0)` | **0.13 s** | **0.12 s** | 73.5 % |

1 MB text:

| algo | compress | decompress | ratio % |
| --- | --- | --- | --- |
| `zpaq.exe -m5` | 2.28 s | 2.23 s | 80.0 % |
| `zpaq.compress(t=1)` | 2.08 s | 2.12 s | 80.1 % |
| `zpaq.compress(t=4)` | 0.70 s | 0.75 s | 79.3 % |
| `zpaq.compress(t=12)` | **0.58 s** | **0.58 s** | 77.6 % |

10 MB text:

| algo | compress | decompress | ratio % |
| --- | --- | --- | --- |
| `zpaq.exe -m5` | 24.14 s | 25.13 s | 84.2 % |
| `zpaq.compress(t=1)` | 20.92 s | 21.50 s | 84.2 % |
| `zpaq.compress(t=4)` | 6.72 s | 6.92 s | 82.8 % |
| `zpaq.compress(t=12)` | **3.83 s** | **3.83 s** | 81.2 % |

100 MB text:

| algo | compress | decompress | ratio % |
| --- | --- | --- | --- |
| `zpaq.exe -m5` | 252.5 s | 250.7 s | 86.7 % |
| `zpaq.compress(t=1)` | 324.2 s | 85.9 s | 85.0 % |
| `zpaq.compress(t=0)` (12 cores) | **74.1 s** | **73.2 s** | 84.5 % |
| `zpaq.compress(dedup=True)` | 325.4 s | 120 s | **85.06 %** |

**How the speedup is achieved.** `libzpaq`'s reference compiler emits an interpreter for the per-byte context-mixing predictor at compression levels 3-5. The official `zpaq.exe` on x86_64 ships with that interpreter replaced by a JIT that translates the predictor bytecode into native machine code at archive-open time. This package's x86_64 wheels enable the same JIT path **plus**:

- multi-threaded block compression via `threads=N` (the official CLI tops out at 2 cores by default)
- multi-threaded block decompression — we scan the archive for ZPAQ locator-tag block boundaries, dispatch each block to a worker, and concatenate. The official `libzpaq` API exposes only sequential decompress; doing it block-parallel makes our 125 MB decompress about 3× faster than `zpaq.exe x`.
- skip-checksum-by-default (`verify=False`), since pure-data workflows rarely need the SHA-1 per block that `zpaq.exe` always computes
- AVX2-enabled compile flags (auto-vectorization on x86_64; CPUs from 2013+ are covered, older fall back to the sdist build)
- a libsais-backed suffix array constructor for level-3 BWT mode (Apache 2.0, several times faster than `libzpaq`'s vendored libdivsufsort-lite)
- a faster decompress path for archives produced by `zpaq.compress` (avoids the JIDAC-aware per-segment buffering)

Compress scales nearly linearly with thread count up to ~12 cores. Compression ratio drops slightly as threads increase (more block boundaries reduce per-block context size); the ratio for `t=1` matches or beats the CLI on every workload.

Pass `dedup=True` to match `zpaq.exe`'s ratio on inputs with repeated content — fragment-level dedup splits the input into ~64 KB content-defined chunks and stores each unique chunk once. The output is a JIDAC archive that both this package's `decompress` and the official `zpaq x` CLI extract byte-exactly. Default behavior remains the raw streaming format (faster, slightly worse ratio on heavily-repetitive inputs).

ARM / Apple Silicon wheels disable the x86-only JIT and AVX2 flags but still benefit from threading, libsais, and the fast decompress path.

API
---

```py
zpaq.compress(
    data,                    # bytes-like
    level=5,                 # 0..5 (0=store, 5=strongest)
    threads=0,               # 0 (default) = auto-detect host CPU count, clamped
                             # by input size (64KB minimum chunk per worker).
                             # 1 = single-thread, deterministic, best ratio.
                             # N>1 = pin to exactly N workers.
    hints=False,             # If True, scan input for text/exe signatures and order-1
                             # redundancy, pass them to libzpaq via the method string.
                             # Slight overhead, helps ratio on some mixed/binary data.
    verify=False,            # If True, compute & embed SHA-1 per segment. zpaq.exe
                             # also writes these by default; turning them off makes
                             # both this package and zpaq.exe skip verification on
                             # extract, which is faster but won't catch corruption.
    method=None,             # Optional raw libzpaq method-string override (e.g. "x4,4,1"
                             # for custom predictor specs). Overrides level/hints when set.
    dedup=False,             # If True, emit a JIDAC-format archive with fragment-level
                             # deduplication. Input is content-defined-chunked into ~64KB
                             # fragments, identical fragments are stored once. Output is
                             # zpaq.exe-extractable. Improves ratio on repetitive data;
                             # currently single-threaded encode.
) -> bytes

zpaq.decompress(
    data,                    # bytes-like ZPAQ stream
    verify=False,            # If True, recompute SHA-1 of each segment and compare to
                             # the one stored in the archive. Raises zpaq.Error on
                             # mismatch. Default off for speed.
) -> bytes

zpaq.Error                   # Raised on libzpaq failures (corrupt stream, bad header, etc.)
```

Both `compress` and `decompress` release the GIL while libzpaq runs, so `zpaq` plays well with threaded workloads.

Compatibility with the `zpaq` CLI
---

`zpaq.compress()` emits the same on-disk format `libzpaq` itself writes, and `zpaq.decompress()` understands archives produced by the `zpaq a` journaling archiver (it identifies the JIDAC index/hash/info segments, discards them, and strips each data segment's trailing fragment-size footer so the recovered bytes match the original file exactly).

Tested on ten varied real files (1 KB to 25 MB, text/image/csv/jar/png/jpg/svg/exe/binary, compression levels 1-5):

| Direction | Result |
| --- | --- |
| `zpaq.compress` → `zpaq.decompress` | 10 / 10 byte-exact |
| `zpaq.compress` → official `zpaq x` CLI | 10 / 10 byte-exact |
| official `zpaq a` CLI → `zpaq.decompress` | 10 / 10 byte-exact |

```py
import zpaq

# Pipe to the official CLI
with open("out.zpaq", "wb") as f:
    f.write(zpaq.compress(my_bytes, level=5))
# ...later, from any machine with the zpaq executable installed:
#   $ zpaq x out.zpaq

# Read an archive that someone else produced with `zpaq a`
with open("their.zpaq", "rb") as f:
    file_bytes = zpaq.decompress(f.read())
```

When `zpaq.decompress` is fed a multi-file archive it returns the concatenated bytes of every file in the order the CLI stored them. A future release will expose a per-segment iterator API so individual files can be addressed by name.

Future work
---

A few performance levers are still on the table; pull requests welcome:

- **Profile-guided optimization (PGO).** Adding `/GENPROFILE` + `/USEPROFILE` to the MSVC build (and equivalents on gcc/clang) typically gains another 5-15%. Skipped here because cibuildwheel doesn't expose a clean two-stage build hook yet.
- **Hand-written SIMD in the predictor.** `/arch:AVX2` is enabled so the compiler auto-vectorizes where it can. The actual hot loop at compression levels 3-5 is the JIT-emitted predictor, which currently emits one x86 instruction at a time; rewriting the JIT to emit AVX2 mul-add chains for the MIX/ISSE components would be a real gain.
- **Parallel JIDAC encode.** `dedup=True` is currently single-threaded; splitting the fragment-build pass across cores would speed up large dedup compresses.
- **Per-segment archive API.** `zpaq.decompress` currently returns the concatenated bytes of every segment in a multi-file `zpaq a` archive. A future iterator API would let callers address individual files by name.

License
---

This package is released under the same terms as the underlying `libzpaq` sources: public domain. See `src/zpaq/vendor/COPYING`.

The vendored libsais suffix array library is Apache 2.0 (Ilya Grebnov). See `src/zpaq/vendor/LICENSE-libsais`.

---

Not affiliated with Matt Mahoney. `libzpaq` was released into the public domain by its original author; this Python package wraps those sources and is an independent community project.
