Metadata-Version: 2.4
Name: wick-formatter
Version: 0.1.2
Summary: Compact text formats for feeding structured data into LLM contexts.
Author: wick-formatter contributors
License: MIT License
        
        Copyright (c) 2026 wick-formatter contributors
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
License-File: NOTICE
Keywords: encoding,format,lean,llm,mcp
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup
Requires-Python: >=3.11
Requires-Dist: mcp>=1.27.0
Provides-Extra: benchmarks
Requires-Dist: datasets>=2.18; extra == 'benchmarks'
Requires-Dist: httpx>=0.28.1; extra == 'benchmarks'
Provides-Extra: dev
Requires-Dist: hypothesis>=6.151.12; extra == 'dev'
Requires-Dist: pytest-cov>=7.1.0; extra == 'dev'
Requires-Dist: pytest>=9.0.3; extra == 'dev'
Description-Content-Type: text/markdown

# wick-formatter

[![CI](https://github.com/P6rguVyrst/wick-formatter/actions/workflows/ci.yml/badge.svg)](https://github.com/P6rguVyrst/wick-formatter/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/P6rguVyrst/wick-formatter/branch/main/graph/badge.svg)](https://codecov.io/gh/P6rguVyrst/wick-formatter)
[![PyPI version](https://img.shields.io/pypi/v/wick-formatter.svg)](https://pypi.org/project/wick-formatter/)
[![Docker](https://img.shields.io/badge/ghcr.io-wick--formatter-blue)](https://ghcr.io/p6rguvyrst/wick-formatter)

Compact text formats for feeding structured data into LLM contexts.
Python library, CLI, and stdio MCP server.

`wick-formatter`'s primary format is **LEAN** (LLM-Efficient Adaptive
Notation), an independent Python implementation of a format originally
designed by Denys Fiialko. On tabular structured data, LEAN typically
produces substantially fewer characters than uncompressed JSON while
remaining losslessly round-trippable.

## Guarantees

**100% lossless round-trip.** For any supported value:
```
decode(encode(data)) == data
```
This is verified on every benchmark run. If round-trip fails on any
item, the benchmark hard-fails before measuring anything else.

**No accuracy loss.** LLM task accuracy with LEAN-encoded data matches
JSON-encoded data within measurement noise. The benchmark enforces a
maximum 3 percentage point delta.

## Latest benchmark

| Metric | Value |
|--------|-------|
| Round-trip | 100% lossless |
| Compression | 45.7% fewer characters |
| LLM accuracy (JSON) | 53.8% |
| LLM accuracy (LEAN) | 51.9% |
| Accuracy delta | 1.9pp |
| Items tested | 52 |
| Model | llama3.1:8b-instruct-q4_K_M |
| Corpus | WikiTableQuestions + custom |

See [`docs/BENCHMARK_RESULTS.md`](docs/BENCHMARK_RESULTS.md) for
methodology, data sources, and replication instructions.

## Status

**v0.1.0, pre-release.** API and on-wire format are still stabilising.
See [`docs/CHANGELOG.md`](docs/CHANGELOG.md) for release notes and
[`docs/ROADMAP.md`](docs/ROADMAP.md) for the v0.1.0 acceptance gates
and deferred items.

## Install

```
pip install wick-formatter
```

Runtime dependency: `mcp` (only required when using the MCP server).

## What's in the box

- `wick_formatter` — Python library with a pluggable format registry.
- `wick-formatter` — CLI (`--format=X {encode,decode}`, stdin→stdout).
- `python -m wick_formatter.mcp` — stdio MCP server exposing
  `wf_encode` and `wf_decode`.
- `tests/benchmarks/` — WikiTableQuestions-based harness comparing LEAN to
  a JSON baseline on `llama3.1:8b-instruct-q4_K_M` through an
  OpenAI-compatible endpoint (Ollama by default, API-provider
  swappable).

## Quick example

Encode a small array of records into LEAN's tabular form:

```
echo '[{"a": 1, "b": 2}, {"a": 3, "b": 4}]' \
    | wick-formatter --format=lean encode
```

Decode it back:

```
wick-formatter --format=lean decode < record.lean
```

The full round-trip contract — including the `~`-marker semi-tabular
path, dot-flatten, and block encodings — is documented in
[`docs/SPEC.md`](docs/SPEC.md).

## Python API

```python
from wick_formatter import get, decode

# Get the LEAN format encoder
lean = get("lean")

# Encode tabular data
data = [{"name": "Alice", "score": 95}, {"name": "Bob", "score": 87}]
encoded = lean.encode(data)
print(encoded)
# name|score
# Alice|95
# Bob|87

# Decode back to original
decoded = decode(encoded)
assert decoded == data
```

## Resource Limits

The decoder enforces configurable limits to prevent denial-of-service
when processing untrusted input:

| Limit | Default | Environment Variable |
|-------|---------|---------------------|
| Input size | 1 GB | `WICK_MAX_INPUT_BYTES` |
| Recursion depth | 100 | `WICK_MAX_RECURSION_DEPTH` |
| Collection size | 10M items | `WICK_MAX_COLLECTION_SIZE` |

### Python API

```python
from wick_formatter.formats.lean import decode, DecodeLimits

# Custom limits
result = decode(text, limits=DecodeLimits(
    max_input_bytes=10 * 1024 * 1024,  # 10 MB
    max_recursion_depth=50,
    max_collection_size=100_000,
))

# Disable limits (not recommended for untrusted input)
result = decode(text, limits=DecodeLimits(
    max_input_bytes=None,
    max_recursion_depth=None,
    max_collection_size=None,
))
```

### MCP Server

Set environment variables before starting the server:

```sh
export WICK_MAX_INPUT_BYTES=10485760  # 10 MB
export WICK_MAX_RECURSION_DEPTH=50
wick-formatter-mcp
```

Set to `0` to disable a limit (not recommended).

## MCP Server Setup

For Claude Code or Codex integration:

```sh
git clone https://github.com/p6rguvyrst/wick-formatter
cd wick-formatter
make client-claude   # or: make client-codex
```

Check status: `make client-status`

Remove: `make client-clean`

Requires `jq` for JSON manipulation. Install with `brew install jq` (macOS) or `apt install jq` (Linux).

## Format specification

The complete LEAN format specification, including the encoder
strategy selection rules and error cases, lives at
[`docs/SPEC.md`](docs/SPEC.md).

## License and attribution

`wick-formatter` is released under the [MIT License](LICENSE).

The LEAN format was originally designed and implemented by Denys
Fiialko; see [`NOTICE`](NOTICE) for attribution, clean-room posture,
and a specific credit for the semi-tabular encoding path first
implemented in his `toon-mcp-server` repository.
