Metadata-Version: 2.4
Name: dsi_bitstream
Version: 0.3.0
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
License-File: LICENSE-Apache-2.0
License-File: LICENSE-LGPL-2.1-or-later
Keywords: bitstream,codes,compression
Author: Tommaso Fontana, Sebastiano Vigna
Author-email: tommaso.fontana.96@gmail.com, sebastiano.vigna@unimi.it
Maintainer-email: Tommaso Fontana <tommaso.fontana.96@gmail.com>
Requires-Python: >=3.7
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: changelog, http://github.com/zommiommy/dsi-bitstream-py/blob/master/CHANGELOG.md
Project-URL: repository, http://github.com/zommiommy/dsi-bitstream-py

# dsi-bitstream-py
![GitHub CI](https://github.com/zommiommy/dsi-bitstream-py/actions/workflows/test.yml/badge.svg)
![license](https://img.shields.io/crates/l/dsi-bitstream)
[![](https://tokei.rs/b1/github/zommiommy/dsi-bitstream-py?type=Rust,Python)](https://github.com/zommiommy/dsi-bitstream-py)
[![Supported Python versions](https://img.shields.io/badge/Python-3.7+-blue.svg)](https://pypi.org/project/ensmallen/#history)
[![Pypi total project downloads](https://pepy.tech/badge/dsi_bitstream)](https://pepy.tech/badge/dsi_bitstream)
[![Pypi project](https://badge.fury.io/py/dsi_bitstream.svg)](https://badge.fury.io/py/dsi_bitstream)

Python bindings for [dsi-bitstream-rs](https://github.com/vigna/dsi-bitstream-rs), a Rust implementation of read/write bit streams supporting several types of instantaneous codes.

## Installation
```
pip install dsi_bitstream
```

## Usage

### Reading and writing codes

```python
from dsi_bitstream import BitWriterLittleEndian, BitReaderLittleEndian

writer = BitWriterLittleEndian("./bitstream.bin")

# All write methods return the number of bits written.
writer.write_bits(10, n=5)         # write 10 as 5 raw bits
writer.write_unary(100)
writer.write_gamma(10)
writer.write_delta(2)
writer.write_omega(7)
writer.write_rice(3, k=4)
writer.write_golomb(4, b=10)
writer.write_zeta(10, k=3)
writer.write_pi(42, k=2)
writer.write_exp_golomb(100, k=3)
writer.write_minimal_binary(10, max=100)
writer.flush()

reader = BitReaderLittleEndian("./bitstream.bin")
assert reader.read_bits(n=5) == 10
assert reader.read_unary() == 100
assert reader.read_gamma() == 10
assert reader.read_delta() == 2
assert reader.read_omega() == 7
assert reader.read_rice(k=4) == 3
assert reader.read_golomb(b=10) == 4
assert reader.read_zeta(k=3) == 10
assert reader.read_pi(k=2) == 42
assert reader.read_exp_golomb(k=3) == 100
assert reader.read_minimal_binary(max=100) == 10
```

Seeking is supported on the reader:
```python
pos = reader.bit_pos()   # bits from the start of the file
reader.set_bit_pos(pos)  # seek back
```

Big-endian variants are available as `BitReaderBigEndian` / `BitWriterBigEndian`.

### Analyzing codes with `CodesStats`

`CodesStats` records a stream of non-negative integers and computes the total
bit cost for every supported code, so you can pick the most compact one:

```python
from dsi_bitstream import CodesStats

stats = CodesStats()
for value in data:
    stats.update(value)

# Best code and its total bit cost.
code, bits = stats.best_code()   # e.g. ("Zeta(3)", 48120)

# Full ranking, cheapest first.
for code, bits in stats.get_codes():
    print(f"{code:>20s}: {bits} bits")

# Query a specific code.
bits = stats.bits_for("Delta")   # returns None if out of tracked range

# Merge stats from parallel workers.
combined = stats_a + stats_b
```

Field-level access is available via properties: `total`, `unary`, `gamma`,
`delta`, `omega`, `vbyte`, `zeta`, `golomb`, `exp_golomb`, `rice`, `pi`.
The array properties (`zeta`, `golomb`, etc.) return a list of bit costs, one
per parameter value.

### Dynamic code dispatch with `Code`

`Code` wraps the Rust `Codes` enum, letting you select a code at runtime
and use it to read, write, or compute bit lengths:

```python
from dsi_bitstream import Code, BitWriterBigEndian, BitReaderBigEndian

code = Code.zeta(3)

w = BitWriterBigEndian("out.bin")
bits = code.write(w, 42)  # returns number of bits written
w.flush()

r = BitReaderBigEndian("out.bin")
val = code.read(r)        # returns 42
```

Available constructors: `Code.unary()`, `Code.gamma()`, `Code.delta()`,
`Code.omega()`, `Code.vbyte_le()`, `Code.vbyte_be()`, `Code.zeta(k)`,
`Code.pi(k)`, `Code.golomb(b)`, `Code.exp_golomb(k)`, `Code.rice(log2_b)`.

Parse from strings: `Code.parse("Zeta(3)")`. Equivalent codes compare equal:
`Code.zeta(1) == Code.gamma()`. Use `code.canonicalize()` to normalize.

### Code length functions

Compute the bit length of a code for a given value without writing to a stream:

```python
from dsi_bitstream import len_gamma, len_zeta, len_delta

len_gamma(42)      # 11
len_zeta(100, 3)   # 11
len_delta(7)       # 8
```

Also available: `len_unary`, `len_omega`, `len_pi`, `len_rice`, `len_golomb`,
`len_exp_golomb`, `len_minimal_binary`.

The same is available via `Code.len()`:

```python
code = Code.zeta(3)
code.len(100)  # 11 -- same as len_zeta(100, 3)
```

## Building

### With Nix (recommended)

The repository includes a `flake.nix` with two package outputs:

```shell
# Native wheel (linux tag, for local use)
nix build .#default.dist

# manylinux2014 wheel (PyPI-uploadable, uses zig as linker)
nix build .#manylinux.dist

# in either cases the wheel will be in:
ls result-dist/dsi_bitstream-*.whl
```

The manylinux wheel is built with `maturin --zig`, which links against glibc
2.17 headers shipped by zig, and verified with `auditwheel` during the build.

A dev shell is also available:
```shell
nix develop
maturin develop  # build & install in-place for development
```

### Without Nix

```shell
pip install maturin
maturin develop          # development build
maturin build --release  # release wheel
```

