Metadata-Version: 2.4
Name: raif-format
Version: 0.5.0
Summary: A token-efficient, repair-tolerant interchange format for LLM I/O — pure-Python encode/decode/fix/validate.
Project-URL: Repository, https://github.com/skrrt-sh/raif-standard
Project-URL: Homepage, https://github.com/skrrt-sh/raif-standard
Author-email: truehazker <40111175+truehazker@users.noreply.github.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: interchange,json,llm,raif,serialization,tokens
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown

# raif-format (Python)

Pure-Python implementation of **RAIF** — a token-efficient, repair-tolerant
interchange format for LLM input/output. Stdlib only, no runtime dependencies,
fully typed (PEP 561).

This package mirrors the canonical TypeScript reference byte-for-byte; parity is
pinned by a shared conformance corpus.

- Spec & monorepo: <https://github.com/skrrt-sh/raif-standard>
- JavaScript/TypeScript package: `raif-format` on npm

## Install

```sh
pip install raif-format        # or: uv add raif-format
```

Installs the `raif-format` distribution; the import package is `raif`.

## Usage

```python
from raif import encode, decode, decode_lenient, fix, validate, parse_schema

# JSON object -> canonical RAIF (byte-identical to the TS encoder)
encode({"to": "a@b.com", "subject": "hi"})
# 'subject=hi\nto=a@b.com'

# Generation profile (what models are trained to emit)
encode({"items": [{"id": 1}, {"id": 2}]}, {"profile": "generation"})

# RAIF -> JSON (with repair reporting)
decode("a=1\nb=hi")
# {'ok': True, 'value': {'a': 1, 'b': 'hi'}, 'repairs': []}

# Per-leaf recovery — never raises, surfaces truncation
decode_lenient("<raif>\ncity=Oslo\nlat")
# {'value': {'city': 'Oslo'}, 'errors': [...], 'repairs': [...], 'truncated': True}

# Canonicalize (decode -> re-encode); idempotent
fix("```\na=1\n```")
# {'ok': True, 'canonical': 'a=1', 'repairs': [...]}

# Read-only canonicality check
validate("a=1")
# {'ok': True}

# Optional schema-typed decode
schema = parse_schema("priority:n\nnote:s?")
decode("priority=2\nnote=hi", schema)
```

## API

| Function | Returns |
| --- | --- |
| `encode(obj, opts=None)` | `str` (canonical RAIF) |
| `decode(text, schema=None)` | `{"ok", "value"\|"error", "repairs"}` |
| `decode_lenient(text, schema=None)` | `{"value", "errors", "repairs", "truncated"}` |
| `fix(text, schema=None)` | `{"ok", "canonical"\|"error", "repairs"}` |
| `validate(text, schema=None)` | `{"ok"}` or `{"ok": False, "errors"}` |
| `parse_schema(decl)` | `RaifSchema` |

`opts` is `{"profile": "canonical" | "generation", "markers": bool}`.

## License

Apache-2.0
