Metadata-Version: 2.4
Name: json-repair-engine
Version: 0.1.6
Summary: LLM-focused JSON repair engine with diagnostics and repair reports.
Project-URL: Homepage, https://github.com/dinis-a/llmjson
Project-URL: Repository, https://github.com/dinis-a/llmjson
Project-URL: Issues, https://github.com/dinis-a/llmjson/issues
Author: Dinis Akulov
License: MIT
License-File: LICENSE
Keywords: ai,json,json repair,llm,openai,parser
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Description-Content-Type: text/markdown

# json-repair-engine

[![CI](https://github.com/dinis-a/json-repair-engine/actions/workflows/test-and-publish.yml/badge.svg)](https://github.com/dinis-a/json-repair-engine/actions)
[![PyPI version](https://img.shields.io/pypi/v/json-repair-engine)](https://pypi.org/project/json-repair-engine/)
[![Python](https://img.shields.io/pypi/pyversions/json-repair-engine)](https://pypi.org/project/json-repair-engine/)

A JSON repair engine built for LLM-generated output. Fixes common syntax errors and produces a structured diagnostic report for every repair.

## Problem

LLMs frequently emit malformed JSON — trailing commas, Python-style literals (`True`, `False`, `None`), unquoted keys, unterminated structures, comments, single-quoted strings, and more. Standard JSON parsers reject these outright, forcing brittle regex workarounds in every integration.

Most repair tools fix silently. `json-repair-engine` explains every change.

## Features

- **Python literals** — `True` → `true`, `False` → `false`, `None` → `null`, `undefined` → `null`
- **Trailing commas** — strips commas before `}` and `]`
- **Unquoted keys** — quotes bare identifiers in objects (`{key: "val"}` → `{"key": "val"}`)
- **Single-quoted strings** — converts `'string'` to `"string"`
- **Comments** — strips `//`, `/* */`, and `#` comments
- **Missing commas** — inserts commas between adjacent values (`[1 2 3]` → `[1, 2, 3]`)
- **Extra commas** — collapses doubled commas (`[1,,2]` → `[1,2]`)
- **Unterminated structures** — closes missing braces, brackets, and strings
- **Trailing garbage** — extracts valid JSON from text with extra content before/after
- **Markdown fences** — extracts JSON from ` ```json ... ``` ` blocks
- **Hex literals** — `0xFF` → `255`
- **String-aware repair** — all replacements skip quoted string contents
- **Structured diagnostics** — every repair includes type, position, original, replacement, and message
- **Confidence scoring** — each fix has a penalty; aggregate score in the report
- **Zero dependencies** — standard library only

## Installation

```bash
pip install json-repair-engine
```

## Usage

### Python API

```python
from llmjson import loads

data, report = loads('{"ok": True, "items": [1, 2,],}')

print(data)              # {'ok': True, 'items': [1, 2]}
print(report.valid)      # True
print(report.repaired)   # True
print(report.confidence) # e.g. 0.97
print(report.repair_count)  # 3

for fix in report.fixes:
    print(f"[{fix.type}] at position {fix.position}: {fix.message}")
```

### CLI

```bash
$ llmjson response.json
{
  "ok": true,
  "items": [1, 2]
}

--- Repair Report ---
{
  "valid": true,
  "repaired": true,
  "confidence": 0.97,
  "repairs": 3
}
```

## API Reference

### `loads(text: str) -> tuple[dict | list, RepairReport]`

Parse and repair a JSON string. Returns the parsed data (object or array) and a repair report.

### `RepairReport`

| Field          | Type         | Description                                      |
|----------------|--------------|--------------------------------------------------|
| `valid`        | `bool`       | Whether the result is valid JSON                 |
| `repaired`     | `bool`       | Whether any repairs were applied                 |
| `confidence`   | `float`      | Aggregate confidence (starts at 1.0, penalized per fix) |
| `fixes`        | `list[Fix]`  | List of individual repairs applied               |
| `repair_count` | `int`        | Total number of fixes                            |

### `Fix`

| Field         | Type   | Description                          |
|---------------|--------|--------------------------------------|
| `type`        | `str`  | Category of the fix                  |
| `position`    | `int`  | Character offset where the fix was applied |
| `original`    | `str`  | Original malformed text              |
| `replacement` | `str`  | Corrected text                       |
| `message`     | `str`  | Human-readable description           |

### Fix types

| Type                | Penalty | Example                              |
|---------------------|---------|--------------------------------------|
| `TrailingComma`     | 0.01    | `[1, 2,]` → `[1, 2]`               |
| `MarkdownFence`     | 0.01    | ` ```json ... ``` ` → inner JSON    |
| `CommentStripped`   | 0.01    | `// comment` → removed              |
| `InvalidLiteral`    | 0.02    | `True` → `true`                     |
| `UndefinedLiteral`  | 0.02    | `undefined` → `null`                |
| `UnquotedKey`       | 0.03    | `{key: "v"}` → `{"key": "v"}`      |
| `HexLiteral`        | 0.03    | `0xFF` → `255`                      |
| `SingleQuoteString` | 0.05    | `'val'` → `"val"`                   |
| `ExtraComma`        | 0.05    | `[1,,2]` → `[1,2]`                 |
| `TrailingGarbage`   | 0.05    | `{"a":1} extra` → `{"a":1}`        |
| `MissingComma`      | 0.08    | `[1 2]` → `[1, 2]`                 |
| `StructuralClosure` | 0.15    | `{"a":1` → `{"a":1}`               |

## License

MIT
