Metadata-Version: 2.4
Name: toonify
Version: 0.0.1
Summary: TOON (Token-Oriented Object Notation) - A compact, human-readable serialization format for LLMs
Project-URL: Homepage, https://github.com/toon-format/toon
Project-URL: Repository, https://github.com/toon-format/toon
Project-URL: Documentation, https://github.com/toon-format/toon#readme
Author: TOON Format Contributors
License: MIT
License-File: LICENSE
Keywords: csv,json,llm,serialization,token-efficient,toon
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Requires-Python: >=3.8
Requires-Dist: tiktoken>=0.5.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# TOON (Token-Oriented Object Notation)

A compact, human-readable serialization format designed for passing structured data to Large Language Models with significantly reduced token usage.

[![Python Version](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Overview

TOON achieves **CSV-like compactness** while adding **explicit structure**, making it ideal for:
- Reducing token costs in LLM API calls
- Improving context window efficiency
- Maintaining human readability
- Preserving data structure and types

### Key Features

- ✅ **Compact**: 30-60% smaller than JSON for structured data
- ✅ **Readable**: Clean, indentation-based syntax
- ✅ **Structured**: Preserves nested objects and arrays
- ✅ **Type-safe**: Supports strings, numbers, booleans, null
- ✅ **Flexible**: Multiple delimiter options (comma, tab, pipe)
- ✅ **Smart**: Automatic tabular format for uniform arrays
- ✅ **Efficient**: Key folding for deeply nested objects

## Installation

```bash
pip install toon-format
```

For development:
```bash
pip install toon-format[dev]
```

## Quick Start

### Python API

```python
from toon import encode, decode

# Encode Python dict to TOON
data = {
    'users': [
        {'id': 1, 'name': 'Alice', 'role': 'admin'},
        {'id': 2, 'name': 'Bob', 'role': 'user'}
    ]
}

toon_string = encode(data)
print(toon_string)
# Output:
# users[2]{id,name,role}:
#   1,Alice,admin
#   2,Bob,user

# Decode TOON back to Python
result = decode(toon_string)
assert result == data
```

### Command Line

```bash
# Encode JSON to TOON
toon input.json -o output.toon

# Decode TOON to JSON
toon input.toon -o output.json

# Use with pipes
cat data.json | toon -e > data.toon

# Show token statistics
toon data.json --stats
```

## TOON Format Specification

### Basic Syntax

```toon
# Simple key-value pairs
name: Alice
age: 30
active: true
```

### Arrays

**Primitive arrays** (inline):
```toon
numbers: [1,2,3,4,5]
tags: [python,serialization,llm]
```

**Tabular arrays** (uniform objects with header):
```toon
users[3]{id,name,email}:
  1,Alice,alice@example.com
  2,Bob,bob@example.com
  3,Charlie,charlie@example.com
```

**List arrays** (non-uniform or nested):
```toon
items[2]:
  value1
  value2
```

### Nested Objects

```toon
user:
  name: Alice
  profile:
    age: 30
    city: NYC
```

### Quoting Rules

Strings are quoted only when necessary:
- Contains special characters (`,`, `:`, `"`, newlines)
- Has leading/trailing whitespace
- Looks like a literal (`true`, `false`, `null`)
- Is empty

```toon
simple: Alice
quoted: "Hello, World"
escaped: "He said \"hello\""
multiline: "Line 1\nLine 2"
```

## API Reference

### `encode(data, options=None)`

Convert Python object to TOON string.

**Parameters:**
- `data`: Python dict or list
- `options`: Optional dict with:
  - `delimiter`: `'comma'` (default), `'tab'`, or `'pipe'`
  - `indent`: Number of spaces per level (default: 2)
  - `key_folding`: `'off'` (default) or `'safe'`
  - `flatten_depth`: Max depth for key folding (default: None)

**Example:**
```python
toon = encode(data, {
    'delimiter': 'tab',
    'indent': 4,
    'key_folding': 'safe'
})
```

### `decode(toon_string, options=None)`

Convert TOON string to Python object.

**Parameters:**
- `toon_string`: TOON formatted string
- `options`: Optional dict with:
  - `strict`: Validate structure strictly (default: True)
  - `expand_paths`: `'off'` (default) or `'safe'`
  - `default_delimiter`: Default delimiter (default: `','`)

**Example:**
```python
data = decode(toon_string, {
    'expand_paths': 'safe',
    'strict': False
})
```

## CLI Usage

```
usage: toon [-h] [-o OUTPUT] [-e] [-d] [--delimiter {comma,tab,pipe}]
            [--indent INDENT] [--stats] [--no-strict]
            [--key-folding {off,safe}] [--flatten-depth DEPTH]
            [--expand-paths {off,safe}]
            [input]

TOON (Token-Oriented Object Notation) - Convert between JSON and TOON formats

positional arguments:
  input                 Input file path (or "-" for stdin)

optional arguments:
  -h, --help            show this help message and exit
  -o, --output OUTPUT   Output file path (default: stdout)
  -e, --encode          Force encode mode (JSON to TOON)
  -d, --decode          Force decode mode (TOON to JSON)
  --delimiter {comma,tab,pipe}
                        Array delimiter (default: comma)
  --indent INDENT       Indentation size (default: 2)
  --stats               Show token statistics
  --no-strict           Disable strict validation (decode only)
  --key-folding {off,safe}
                        Key folding mode (encode only)
  --flatten-depth DEPTH Maximum key folding depth (encode only)
  --expand-paths {off,safe}
                        Path expansion mode (decode only)
```

## Advanced Features

### Key Folding

Collapse single-key chains into dotted paths:

```python
data = {
    'response': {
        'data': {
            'user': {
                'name': 'Alice'
            }
        }
    }
}

# With key_folding='safe'
toon = encode(data, {'key_folding': 'safe'})
# Output: response.data.user.name: Alice
```

### Path Expansion

Expand dotted keys into nested objects:

```python
toon = 'user.profile.age: 30'

# With expand_paths='safe'
data = decode(toon, {'expand_paths': 'safe'})
# Result: {'user': {'profile': {'age': 30}}}
```

### Custom Delimiters

Choose the delimiter that best fits your data:

```python
# Tab delimiter (better for spreadsheet-like data)
toon = encode(data, {'delimiter': 'tab'})

# Pipe delimiter (when data contains commas)
toon = encode(data, {'delimiter': 'pipe'})
```

## Format Comparison

### JSON vs TOON

**JSON** (225 bytes):
```json
{
  "users": [
    {"id": 1, "name": "Alice", "role": "admin"},
    {"id": 2, "name": "Bob", "role": "user"},
    {"id": 3, "name": "Charlie", "role": "guest"}
  ]
}
```

**TOON** (90 bytes, **60% reduction**):
```toon
users[3]{id,name,role}:
  1,Alice,admin
  2,Bob,user
  3,Charlie,guest
```

### When to Use TOON

**Use TOON when:**
- ✅ Passing data to LLM APIs (reduce token costs)
- ✅ Working with uniform tabular data
- ✅ Context window is limited
- ✅ Human readability matters

**Use JSON when:**
- ❌ Maximum compatibility is required
- ❌ Data is highly irregular/nested
- ❌ Working with existing JSON-only tools

## Development

### Setup

```bash
git clone https://github.com/VinciGit00/toon.git
cd toon
pip install -e .[dev]
```

### Running Tests

```bash
pytest
pytest --cov=toon --cov-report=term-missing
```

### Running Examples

```bash
python examples/basic_usage.py
python examples/advanced_features.py
```

## Performance

TOON typically achieves:
- **30-60% size reduction** vs JSON for structured data
- **40-70% token reduction** with tabular data
- **Minimal overhead** in encoding/decoding (<1ms for typical payloads)

## Contributing

Contributions are welcome! Please:

1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Make your changes with tests
4. Run tests (`pytest`)
5. Commit your changes (`git commit -m 'Add amazing feature'`)
6. Push to the branch (`git push origin feature/amazing-feature`)
7. Open a Pull Request

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Credits

Python implementation inspired by the TypeScript TOON library at [toon-format/toon](https://github.com/toon-format/toon).

## Links

- **GitHub**: https://github.com/VinciGit00/toon
- **PyPI**: https://pypi.org/project/toon-format/
- **Documentation**: https://github.com/VinciGit00/toon#readme
- **Format Spec**: https://github.com/toon-format/toon

---

Made with ❤️ by the TOON Format Contributors
