Metadata-Version: 2.4
Name: fletchr-encodings
Version: 0.0.1rc6
Summary: Arrow-native Level 1 bit-pattern decoders (encodings) for fletchr, with byteforge kernels behind the built-ins.
Project-URL: Homepage, https://github.com/fletchr-labs/fletchr
Project-URL: Repository, https://github.com/fletchr-labs/fletchr
Project-URL: Issues, https://github.com/fletchr-labs/fletchr/issues
Project-URL: Changelog, https://github.com/fletchr-labs/fletchr/blob/main/fletchr-encodings/CHANGELOG.md
Author-email: Jonathan Olsten <jolsten@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: arrow,bit-pattern,decoding,encoding,protocol,pyarrow
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: byteforge>=0.0.6
Requires-Dist: fletchr-uintn>=0.0.1rc2
Requires-Dist: numpy>=2.0
Requires-Dist: pyarrow>=16
Description-Content-Type: text/markdown

# fletchr-encodings

Arrow-native **Level 1 bit-pattern decoders** ("encodings") for the
[fletchr](https://github.com/fletchr-labs/fletchr) ecosystem.

An encoding turns the raw bits of a `fletchr.uintn(bits=N)` column into a
number — unsigned/signed integer, IEEE 754 float, MIL-STD-1750A, BCD, and
so on. This package owns the Arrow boundary (bits-from-type, validity-bitmap
preservation) and a plugin system so users can add their own codecs. The
built-in codecs wrap [byteforge](https://github.com/jolsten/byteforge)'s
numpy/C kernels, but byteforge is an implementation detail *behind* the
built-ins — the public surface is entirely fletchr's own.

This is the Level 1 backend referenced by
[`fletchr-measurand`'s SPEC-decode.md](../fletchr-measurand/SPEC-decode.md);
Level 2 (engineering-unit scaling) lives in `fletchr-measurand`.

## Public API

```python
from fletchr_encodings import create_encoding, decode_array, list_encodings
from fletchr_uintn import uintn_array

raw = uintn_array([0x00, 0x7F, 0x80, 0xFF], bits=8)

# Resolve by canonical name and decode (bits read from the array type):
decode_array(raw, "2c").to_pylist()        # -> [0, 127, -128, -1]

# Or pass a resolved Encoding instance:
enc = create_encoding("2c")
decode_array(raw, enc)
```

### Functions

| Function | Purpose |
|---|---|
| `decode_array(arr, encoding)` | Decode a 1-D `UIntNArray` or 2-D `FixedSizeListArray<UIntN>`. `encoding` is a canonical name or an `Encoding`. Validates storage compatibility, reads the bit width from the type, preserves the validity bitmap. |
| `create_encoding(name, *, bits=None)` | Resolve a canonical `name` to an `Encoding`. `bits` (optional) validates storage compatibility eagerly. Raises `UnknownEncodingError` on an unknown name. |
| `list_encodings()` | Sorted list of registered canonical names (loads plugins first). |
| `register_plugin_group(group)` | Add an entry-point group for downstream meta-packages. |
| `load_plugins(*paths, strict=False)` | Explicitly import plugin modules (notebooks/tests). |

### Key design points

- **Bits come from the type, not a constructor arg.** An `Encoding` is
  stateless with respect to width; a single instance decodes any width it
  supports. The width is read from the column's `fletchr.uintn` type at
  decode time.
- **Validity bitmap preserved end to end.** Nulls in raw bits stay null in
  the decoded array.
- **One canonical `name` per encoding, no aliases.** The name is both the
  registry key and the value written to `fletchr.encoding` field metadata,
  so the registry is 1:1 with the SPEC-decode table.
- **Level 1 output dtype is natural**: integral encodings stay integral
  (`u`/`bcd`/`gray`/`boolean` → unsigned; `2c`/`1c`/`sm`/`ob` → signed),
  floats decode to their natural width (`ieee16` → float16, `ieee32` →
  float32, `ieee64` → float64, other floats → float64). Level 2 (in
  `fletchr-measurand`) promotes everything to float64.

## Built-in encodings

| Name | Decodes to | Required storage |
|---|---|---|
| `u` | unsigned int (identity) | `uintn(bits=N)` |
| `sm` | signed int | `uintn(bits=N)`, `N>=2` |
| `1c` | signed int | `uintn(bits=N)`, `N>=2` |
| `2c` | signed int | `uintn(bits=N)`, `N>=2` |
| `ob` | signed int | `uintn(bits=N)`, `N>=2` |
| `ieee16` / `ieee32` / `ieee64` | float16 / float32 / float64 | `uintn(bits=16/32/64)` |
| `1750a32` / `1750a48` | float64 | `uintn(bits=32/48)` |
| `bcd` | unsigned int | `uintn(bits=N)`, `N % 4 == 0` |
| `boolean` | unsigned int (0/1) | `uintn(bits=1)` |
| `gray` | unsigned int | `uintn(bits=N)` |
| `ibm32` / `ibm64` | float64 | `uintn(bits=32/64)` |
| `dec32` / `dec64` / `dec64g` | float64 | `uintn(bits=32/64)` |
| `ti32` / `ti40` | float64 | `uintn(bits=32/40)` |

A storage-width mismatch (e.g. `ieee32` on a `bits=16` column) raises
`EncodingError` before any decode happens, per the SPEC's
storage-compatibility rule.

## Custom encodings (plugins)

Subclass `Encoding`, give it a canonical `name`, and implement `_decode_np`.
Subclassing alone registers it; declaring the module in the
`fletchr_encodings.plugins` entry-point group makes it discoverable without
an explicit import. See
[`custom-encodings.md`](../fletchr-core/src/fletchr_core/docs/developers/custom-encodings.md).

```python
from fletchr_encodings import Encoding

class CrcPacked(Encoding):
    name = "crc_packed"
    def _decode_np(self, dns, bits):
        ...   # your own numpy; keep padding bits masked
```

A width-constrained codec declares `fixed_bits` or overrides
`supports_bits`; `_decode_np` must keep padding bits masked (the input is
guaranteed `< 2**bits`).

## License

MIT — see [LICENSE](LICENSE).
