Metadata-Version: 2.4
Name: fletchr-uintn
Version: 0.0.1rc3
Summary: PyArrow extension array of unsigned integers with arbitrary fixed bit width N in [1, 64].
Project-URL: Homepage, https://github.com/fletchr-labs/fletchr
Project-URL: Repository, https://github.com/fletchr-labs/fletchr
Project-URL: Issues, https://github.com/fletchr-labs/fletchr/issues
Project-URL: Specification, https://github.com/fletchr-labs/fletchr/blob/main/fletchr-uintn/SPEC.md
Author-email: Jonathan Olsten <jolsten@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: arrow,bit-packing,extension-type,pyarrow,uint
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: numpy>=2.0
Requires-Dist: pyarrow>=17
Description-Content-Type: text/markdown

# fletchr-uintn

A PyArrow extension type for unsigned integers of arbitrary fixed bit
width `N ∈ [1, 64]`. The bit width lives **in the Arrow type** rather
than as sidecar schema metadata, so mismatched widths fail loudly on
`concat`, the width survives slice / cast / IPC / Parquet round-trips,
and any column-level operation that wants to know "how many bits does
this hold" reads it off `column.type.bits`.

## Why?

PyArrow's built-in `uint8` / `uint16` / `uint32` / `uint64` cover only
the four power-of-two widths native to most CPUs. Protocol and binary
formats routinely use other widths (10, 12, 14, 24, 48), and the usual
workarounds — over-allocating (`uint16` for a 12-bit field) or passing
the width out-of-band in schema metadata — either lose the constraint
on bitwise ops or drop it on the next slice. `fletchr.uintn(bits=N)`
puts the width *in* the type and ships bit-width-safe kernels that
keep padding bits zero across every operation.

## Features

- Storage in the smallest native `uint8` / `uint16` / `uint32` /
  `uint64` container that fits `N`; padding bits above `N` are kept
  zero across construction and every operation.
- Lossless round-trip through Arrow IPC, Arrow Flight, and Parquet.
  Readers that don't have the extension registered see the raw `uintN`
  storage transparently — no exotic types in the wire format.
- Full Arrow null support via the standard validity bitmap.
- Bit-width-safe bitwise operators (`~`, `&`, `|`, `^`, shifts,
  popcount, bit-reversal) — padding bits never leak.
- Cross-language wire format pinned in
  [SPEC.md](https://github.com/fletchr-labs/fletchr/blob/main/fletchr-uintn/SPEC.md)
  so Arrow readers in Java, C++, Go, R, JavaScript, etc. can implement
  compatible deserializers.

## Install

```bash
uv add fletchr-uintn        # or: pip install fletchr-uintn
```

Requires Python 3.9+, NumPy 2.0+, and PyArrow 17+.

## Quickstart

```python
import pyarrow as pa
import pyarrow.parquet as pq
from fletchr_uintn import uintn_array

# 12-bit values — fit in a uint16 container, but the type knows it's 12 bits.
a = uintn_array([0, 1, 4095, None, 100], bits=12)
a.type            # UIntNType(bits=12)
a.to_pylist()     # [0, 1, 4095, None, 100]

# Bitwise ops respect the declared width: ~0 is 4095, not 65535.
(~a).to_pylist()  # [4095, 4094, 0, None, 3995]

# Composes as a column inside any pa.Table; round-trips through Parquet.
pq.write_table(pa.table({"x": a}), "out.parquet")
back = pq.read_table("out.parquet").column("x")
assert back.type.bits == 12

# Mismatched bit widths fail at the Arrow type system, not silently:
pa.concat_arrays([a, uintn_array([0, 1], bits=10)])  # raises ArrowInvalid
```

## Public API

```python
from fletchr_uintn import (
    UIntNType,        # the pa.ExtensionType
    UIntNArray,       # the pa.ExtensionArray (with bitwise methods)
    uintn_array,      # validated factory; dispatches on input type
    pack_bits,        # inverse of UIntNArray.unpack_bits
)
```

The extension type registers itself on import, so any `pa.Table`
deserialized after `import fletchr_uintn` will surface `UIntNArray`
columns instead of raw `uintN` storage.

## Links

- Source: <https://github.com/fletchr-labs/fletchr>
- Issues: <https://github.com/fletchr-labs/fletchr/issues>
- Wire format spec: [SPEC.md](https://github.com/fletchr-labs/fletchr/blob/main/fletchr-uintn/SPEC.md)

## License

MIT.
