Metadata-Version: 2.4
Name: translet
Version: 0.1.0
Summary: JSON conversion pipelines via LLM-generated JSONata rules with caching.
Project-URL: Homepage, https://github.com/tumikosha/translet
Project-URL: Issues, https://github.com/tumikosha/translet/issues
Project-URL: Source, https://github.com/tumikosha/translet
Author-email: tumi <tumikosha@gmail.com>
License: MIT License
        
        Copyright (c) 2026 tumi
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: json,jsonata,llm,pipeline,transjson,translation,translet
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Requires-Dist: dbset>=1.0.0
Requires-Dist: jsonata-python>=0.6.0
Requires-Dist: jsonschema>=4.0
Provides-Extra: all-llm
Requires-Dist: openai>=1.40; extra == 'all-llm'
Provides-Extra: azure
Requires-Dist: openai>=1.40; extra == 'azure'
Provides-Extra: dev
Requires-Dist: build; extra == 'dev'
Requires-Dist: openai>=1.40; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Provides-Extra: groq
Requires-Dist: openai>=1.40; extra == 'groq'
Provides-Extra: nvidia
Requires-Dist: openai>=1.40; extra == 'nvidia'
Provides-Extra: openai
Requires-Dist: openai>=1.40; extra == 'openai'
Description-Content-Type: text/markdown

# translet

JSON conversions via **JSONata** rules generated by an **LLM** and cached in a database (through [`dbset`](https://pypi.org/project/dbset/)).

On the first call to `convert`, translet:

1. builds a cache key (from an explicit name or from the structural shape of input + target);
2. if no rule is cached — asks the LLM to produce a JSONata expression and stores it;
3. applies the rule via [`jsonata-python`](https://pypi.org/project/jsonata-python/);
4. validates the result against the target spec (JSON Schema or sample);
5. on failure — regenerates the rule with error context and retries (up to 2 times by default).

Subsequent calls with the same shape hit the cache — no LLM round-trip.

## Contents

- [Installation](#installation)
- [Quick start](#quick-start)
- [Three ways to specify a target](#three-ways-to-specify-a-target)
- [Configuration via `.env`](#configuration-via-env)
- [Async](#async)
- [Manual wiring (without `from_env`)](#manual-wiring-without-from_env)
- [Cache management](#cache-management)
- [Statistics](#statistics)
- [Error handling](#error-handling)
- [Extending](#extending)
- [Development](#development)

## Installation

```bash
pip install translet[all-llm]
```

`all-llm` installs the `openai` SDK, which translet uses to talk to OpenAI, Azure OpenAI, Groq, and NVIDIA NIM. Pick one if you prefer: `translet[openai]`, `translet[azure]`, `translet[groq]`, `translet[nvidia]`.

## Quick start

```python
from translet import Translet

t = Translet.from_env()  # reads TRANSLET_LLM_*, TRANSLET_DB_*, etc.

result = t.transjson.convert(
    {"user": {"name": "Alice", "age": 30}},
    target_sample={"name": "x", "age": 0},
)
# {"name": "Alice", "age": 30}
```

## Three ways to specify a target

`convert(source, *, target_schema=None, target_sample=None, description=None, name=None)` accepts **one** of three target specifications. The choice affects both the LLM prompt and post-validation.

### 1. JSON Schema — strict validation

```python
schema = {
    "type": "object",
    "properties": {
        "full_name": {"type": "string"},
        "age": {"type": "integer", "minimum": 0},
    },
    "required": ["full_name", "age"],
}

result = t.transjson.convert(
    {"first_name": "Alice", "last_name": "Smith", "age": 30},
    target_schema=schema,
)
# {"full_name": "Alice Smith", "age": 30}
```

After rule generation the result is validated through `jsonschema`. A mismatch raises `ValidationError` and triggers regeneration (up to `max_retries`).

### 2. Sample (`target_sample`) — structural match

The LLM gets a concrete example of the desired shape. Validation checks key/type compatibility against the sample.

```python
result = t.transjson.convert(
    {"first_name": "Alice", "last_name": "Smith"},
    target_sample={"full_name": "Alice Smith"},
)
```

### 3. Description + explicit name

When the target shape is trivial or scalar, describe the task in plain text. Always pass `name=` to give the cache an explicit key.

```python
result = t.transjson.convert(
    {"items": [{"v": 1}, {"v": 2}, {"v": 3}]},
    description="sum of all v values",
    name="sum_v_rule",
)
# 6
```

### Cache keys

- `name="foo"` → key `name:foo` (stable, survives changes to the input shape).
- No `name` → key `hash:<sha>` derived from input shape + target. Any structural change yields a fresh key.

## Configuration via `.env`

`Translet.from_env(env_file=...)` loads a `.env`-style file and reads environment variables. Standard `KEY=VALUE` lines, `#` comments, optional surrounding quotes are supported.

```python
from translet import Translet

t = Translet.from_env(env_file=".env")
```

### Environment variables

| Variable | Purpose |
|----------|---------|
| `TRANSLET_LLM_PROVIDER` | `openai` / `azure` / `groq` / `nvidia` |
| `TRANSLET_LLM_MODEL` | Model name (required) |
| `TRANSLET_LLM_BASE_URL` | Optional base URL override |
| `OPENAI_API_KEY` / `AZURE_OPENAI_API_KEY` / `GROQ_API_KEY` / `NVIDIA_API_KEY` | Provider-specific API key (recommended — matches the SDK convention) |
| `TRANSLET_API_KEY` | Generic fallback when no provider-specific key is set |
| `AZURE_OPENAI_ENDPOINT` / `AZURE_OPENAI_API_VERSION` | Azure only |
| `TRANSLET_DB_PATH` | `dbset` connection string (default: `sqlite:///translet.db`) |
| `TRANSLET_DB_TABLE` | Table name (default: `translet_rules`) |
| `TRANSLET_TTL_SECONDS` | Rule TTL (default: no TTL) |
| `TRANSLET_MAX_RETRIES` | Number of regenerate attempts (default: 2) |

The provider-specific key takes precedence over `TRANSLET_API_KEY`.

### Overriding parameters from code

`Translet.from_env(...)` accepts kwargs that override values from `.env` / `os.environ`:

```python
from pathlib import Path
from translet import Translet

DB_PATH = Path("./cache/rules.db")

t = Translet.from_env(
    env_file=".env",
    db_path=f"sqlite:///{DB_PATH}",   # overrides TRANSLET_DB_PATH
    max_retries=5,                    # overrides TRANSLET_MAX_RETRIES
    api_key="sk-...",                 # routed into the provider-specific env var
)
```

Available kwargs: `provider`, `model`, `base_url`, `api_key`, `db_path`, `db_table`, `ttl_seconds`, `max_retries`. `None` (the default) keeps whatever is already in `.env` / the environment. Pass `override=True` to force `load_dotenv` to overwrite existing `os.environ` values from the file.

### Loading `.env` separately from Translet

```python
from translet import load_dotenv

load_dotenv(".env")                # setdefault semantics
load_dotenv(".env", override=True)  # force overwrite
```

## Async

```python
import asyncio
from translet import AsyncTranslet

async def main():
    t = await AsyncTranslet.from_env(env_file=".env")
    result = await t.transjson.aconvert(
        {"user": {"name": "Alice"}},
        target_sample={"name": "x"},
    )
    print(result)

asyncio.run(main())
```

`AsyncTranslet.from_env` accepts the same overrides as the sync version.

## Manual wiring (without `from_env`)

```python
from dbset import connect
from translet import Translet, TransletConfig
from translet.llm import openai
from translet.store import DbSetStore

db = connect("sqlite:///translet.db")
t = Translet(
    llm=openai("gpt-4o", api_key="..."),
    store=DbSetStore(db, table="my_rules"),
    config=TransletConfig(max_retries=2, ttl_seconds=None),
)
```

LLM factories: `openai`, `azure`, `groq`, `nvidia` (sync) and `aopenai`, `aazure`, `agroq`, `anvidia` (async).

`DbSetStore` does not own the connection — close `db` explicitly (`db.close()`). This lets a single connection back several stores / tables.

## Cache management

```python
# Explicit invalidation by name or by raw key
t.transjson.invalidate("sum_v_rule")
t.transjson.invalidate("name:full_name_v1")
t.transjson.invalidate("hash:abc123...")

# Manual TTL eviction (returns the number of removed rules)
removed = t.transjson.evict_expired(ttl_seconds=86400)
```

If `ttl_seconds` is set on `TransletConfig`, calling `evict_expired()` without an argument uses that as the default.

## Statistics

`compute_stats(rules)` aggregates the rule cache; `format_stats(stats)` renders it as text.

```python
from translet import Translet, compute_stats, format_stats

t = Translet.from_env()
rules = t.store.list(limit=10000)
print(format_stats(compute_stats(rules, top=10)))
```

`RuleStats` fields: `total_rules`, `total_uses`, `total_successes`, `total_failures`, `success_rate`, `by_provider`, `by_model`, `top_by_usage`, `oldest_created`, `newest_created`, `last_used`.

A CLI helper for quick checks:

```bash
python examples/show_stats.py --env-file .env
python examples/show_stats.py --db-path "sqlite:///./cache/rules.db" --top 10
```

## Error handling

All exceptions inherit from `TransletError`:

```python
from translet import (
    ConversionError,      # all retries exhausted
    RuleGenerationError,  # LLM failed to produce a usable JSONata expression
    JsonataError,         # JSONata compile/eval failure
    ValidationError,      # result didn't match the schema/sample
    StoreError,           # storage backend failure
)

try:
    result = t.transjson.convert(source, target_sample=sample)
except ConversionError as exc:
    print(f"failed for key={exc.key}, last error: {exc.last_error!r}")
```

The `on_failure` field of `TransletConfig` controls behaviour: `"regenerate"` (default — retry on failure) or `"raise"` (fail fast).

## Extending

### Custom LLM provider

```python
from translet.llm import LLMClient, Message

class MyLLM:
    provider = "my-provider"
    model = "my-model"

    def complete(self, messages: list[Message], *, temperature: float = 0.0, max_tokens: int = 2048) -> str:
        ...  # return the response text

t = Translet(llm=MyLLM(), store=store)
```

`LLMClient` is a `runtime_checkable` `Protocol` — explicit inheritance is optional, structural compatibility is enough. The async counterpart is `AsyncLLMClient` with an `acomplete` method.

### Custom store

Implement `translet.store.RuleStore` (or `AsyncRuleStore`) — methods `get`, `put`, `touch`, `delete`, `evict_expired`, `list`. `DbSetStore` is a reference implementation.

### Custom system prompt

```python
from translet import TransletConfig

config = TransletConfig(system_prompt="You are a JSONata generator. Output only the expression.")
```

For full prompt control, pass your own `prompt_builder=YourPromptBuilder()` (see `translet.transjson.PromptBuilder`).

## Development

```bash
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,all-llm]"
pytest -q
```

Project layout:

```
src/translet/
  core.py              # Translet / AsyncTranslet, from_env, load_dotenv
  llm/                 # LLMClient Protocol + OpenAI-compatible clients
  store/               # RuleStore Protocol + DbSetStore
  transjson/           # convert pipeline (generate → JSONata → validate → retry)
  stats.py             # cache-statistics aggregation
  exceptions.py
examples/
  simple_nvidia.py        # minimal NVIDIA NIM example
  simple_from_env.py      # universal example driven by .env
  show_stats.py           # CLI: cache statistics
```

## Build and publish

```bash
python -m build
python -m twine check dist/*
python -m twine upload dist/*
```

## License

MIT
