Metadata-Version: 2.4
Name: fast_json_repair
Version: 0.2.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: General
Classifier: Operating System :: OS Independent
Requires-Dist: orjson>=3.10.0
License-File: LICENSE
Summary: High-performance JSON repair library using Rust
Keywords: json,repair,parser,rust,performance,llm,ai
Author-email: dvideby0 <>
License: MIT
Requires-Python: >=3.11, <3.15
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/dvideby0/fast_json_repair
Project-URL: Documentation, https://github.com/dvideby0/fast_json_repair#readme
Project-URL: Repository, https://github.com/dvideby0/fast_json_repair
Project-URL: Issues, https://github.com/dvideby0/fast_json_repair/issues

# fast_json_repair

[![PyPI version](https://badge.fury.io/py/fast-json-repair.svg)](https://pypi.org/project/fast-json-repair/)
[![Python 3.11-3.14](https://img.shields.io/badge/python-3.11--3.14-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A high-performance JSON repair library for Python, powered by Rust. This is a drop-in replacement for [json_repair](https://github.com/mangiucugna/json_repair) with significant performance improvements.

## 🙏 Attribution

This library is a **Rust port** of the excellent [json_repair](https://github.com/mangiucugna/json_repair) library created by [Stefano Baccianella](https://github.com/mangiucugna). The original Python implementation is a brilliant solution for fixing malformed JSON from Large Language Models (LLMs), and this port aims to bring the same functionality with improved performance.

**All credit for the original concept, logic, and implementation goes to Stefano Baccianella.** This Rust port maintains API compatibility with the original library while leveraging Rust's performance benefits.

If you find this library useful, please also consider starring the [original json_repair repository](https://github.com/mangiucugna/json_repair).

## Features

- 📦 **Available on PyPI**: `pip install fast-json-repair`
- 🚀 **Rust Performance**: Core repair logic implemented in Rust for maximum speed
- 🔧 **Automatic Repair**: Fixes common JSON errors automatically
- 🐍 **Python Compatible**: Works with Python 3.11-3.14
- 🔄 **Drop-in Replacement**: Compatible API with the original json_repair library
- ⚡ **Fast JSON Parsing**: Uses orjson for JSON parsing operations

## Compatibility with Original json_repair

This is a **drop-in replacement** for the original `json_repair` library with the same API:

**✅ Included:**
- `repair_json()` - Main repair function with `return_objects`, `skip_json_loads`, `ensure_ascii`, `indent` parameters
- `loads()` - Convenience function for loading broken JSON directly to Python objects
- All repair capabilities: quotes, literals, commas, brackets, escape sequences, Unicode

**❌ Not Included:**
- File operations (`load()`, `from_file()`) - Use Python's built-in file handling + `repair_json()`
- CLI tool - Library-only implementation
- Streaming support - Not yet implemented

**Key Differences:**
- 🚀 **20x faster average**, up to 110x for large objects with long strings
- 🔢 Unquoted numbers parsed as numbers (not strings)
- 📦 Uses `orjson` for high-performance JSON operations

## Installation

### Quick Install

```bash
pip install fast-json-repair
```

### Build from Source

<details>
<summary>Click to expand build instructions</summary>

#### Prerequisites
- Python 3.11-3.14
- Rust toolchain (`curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`)
- [uv](https://docs.astral.sh/uv/) (recommended) or pip

#### Quick Start with uv (Recommended)

```bash
# Clone the repository
git clone https://github.com/dvideby0/fast_json_repair.git
cd fast_json_repair

# Run the automated setup script
./setup.sh
```

The setup script will:
- ✅ Install `uv` and Rust if needed
- ✅ Create a virtual environment (`.venv`)
- ✅ Install all dependencies
- ✅ Build the Rust extension
- ✅ Verify the installation

#### Manual Build Steps

```bash
# Clone the repository
git clone https://github.com/dvideby0/fast_json_repair.git
cd fast_json_repair

# Option 1: Using uv (fast!)
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv sync
maturin develop --release

# Option 2: Using pip
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install maturin orjson
maturin develop --release
```

</details>

## Usage

```python
from fast_json_repair import repair_json, loads

# Fix broken JSON
broken = "{'name': 'John', 'age': 30}"  # Single quotes
fixed = repair_json(broken)
print(fixed)  # {"age":30,"name":"John"}

# Parse directly to Python object
data = loads("{'key': 'value'}")
print(data)  # {'key': 'value'}

# Handle Unicode properly
text = "{'message': '你好世界'}"
result = repair_json(text, ensure_ascii=False)
print(result)  # {"message":"你好世界"}

# Format with indentation
formatted = repair_json("{'a': 1}", indent=2)
```

## What It Repairs

Automatically fixes common JSON formatting issues:

| Issue | Fix |
|-------|-----|
| Single quotes | → Double quotes |
| Unquoted keys | → Quoted keys |
| Python literals (True/False/None) | → JSON (true/false/null) |
| Trailing commas | Removed |
| Missing commas | Added |
| Extra commas | Removed |
| Unclosed brackets/braces | Auto-closed |
| Invalid escape sequences | Fixed |
| Unicode characters | Preserved or escaped (configurable) |

## API Reference

### `repair_json(json_string, **kwargs)`

Repairs invalid JSON and returns valid JSON string.

**Parameters:**
- `json_string` (str): The potentially invalid JSON string to repair
- `return_objects` (bool): If True, return parsed Python object instead of JSON string
- `skip_json_loads` (bool): If True, skip initial validation for better performance
- `ensure_ascii` (bool): If True, escape non-ASCII characters in output
- `indent` (int): Number of spaces for indentation (None for compact output)

**Returns:** 
- str or object: Repaired JSON string or parsed Python object

### `loads(json_string, **kwargs)`

Repairs and parses invalid JSON string to Python object.

**Parameters:**
- `json_string` (str): The potentially invalid JSON string to repair and parse
- `**kwargs`: Additional arguments passed to repair_json

**Returns:**
- object: The parsed Python object

## Performance

This Rust-based implementation provides significant performance improvements over the pure Python original.

### Fast Path Optimization

The library automatically uses the fastest path when possible:

**Fast Path (uses `orjson` for serialization):**
- Valid JSON input
- `ensure_ascii=False` 
- `indent` is either `None` (compact) or `2`

**Fallback Path (uses stdlib `json`):**
- Valid JSON input with `ensure_ascii=True`
- Valid JSON input with `indent` values other than `None` or `2`

**Repair Path (uses Rust implementation):**
- Any invalid JSON that needs repair
- Always respects `ensure_ascii` and `indent` settings

For maximum performance with valid JSON:
```python
# Fastest - uses orjson throughout
result = repair_json(valid_json, ensure_ascii=False, indent=2)

# Slower - falls back to json.dumps for formatting
result = repair_json(valid_json, ensure_ascii=True)  # ASCII escaping
result = repair_json(valid_json, indent=4)  # Custom indentation
```

### Benchmark Results

Comprehensive comparison of fast_json_repair vs json_repair across 20 test cases (10 invalid JSON, 10 valid JSON) with both `ensure_ascii` settings:

| Test Case | fast_json_repair (ms) | json_repair (ms) | Speedup |
|-----------|----------------------|------------------|---------|
| **Invalid JSON (needs repair)** | | | |
| Simple quotes (ascii=T) | 0.007 | 0.032 | 🚀 4.7x |
| Simple quotes (ascii=F) | 0.006 | 0.037 | 🚀 5.7x |
| Medium nested (ascii=T) | 0.020 | 0.192 | 🚀 9.6x |
| Medium nested (ascii=F) | 0.019 | 0.197 | 🚀 10.5x |
| Large array 1000 (ascii=T) | 0.246 | 2.273 | 🚀 9.3x |
| Large array 1000 (ascii=F) | 0.237 | 2.162 | 🚀 9.1x |
| Deep nesting 50 (ascii=T) | 0.055 | 0.410 | 🚀 7.5x |
| Deep nesting 50 (ascii=F) | 0.050 | 0.420 | 🚀 8.4x |
| Large object 500 (ascii=T) | 0.404 | 27.339 | 🚀 **67.7x** |
| Large object 500 (ascii=F) | 0.408 | 26.436 | 🚀 **64.8x** |
| Complex mixed (ascii=T) | 0.033 | 0.408 | 🚀 12.2x |
| Complex mixed (ascii=F) | 0.035 | 0.401 | 🚀 11.4x |
| Very large 5000 (ascii=T) | 29.531 | 580.959 | 🚀 **19.7x** |
| Very large 5000 (ascii=F) | 28.526 | 581.489 | 🚀 **20.4x** |
| Long strings 10K (ascii=T) | 0.040 | 4.403 | 🚀 **110.2x** |
| Long strings 10K (ascii=F) | 0.040 | 4.360 | 🚀 **108.7x** |
| **Valid JSON (fast path)** | | | |
| Small ASCII (ascii=T) | 0.003 | 0.004 | 🚀 1.3x |
| Small ASCII (ascii=F) | 0.002 | 0.005 | 🚀 2.9x |
| Nested structure (ascii=T) | 0.007 | 0.008 | 🚀 1.2x |
| Nested structure (ascii=F) | 0.003 | 0.008 | 🚀 2.4x |
| Large array 1000 (ascii=T) | 0.799 | 0.907 | 🚀 1.1x |
| Large array 1000 (ascii=F) | 0.421 | 0.903 | 🚀 2.1x |
| Large object 500 (ascii=T) | 0.506 | 0.590 | 🚀 1.2x |
| Large object 500 (ascii=F) | 0.281 | 0.571 | 🚀 2.0x |

**Overall: 19.7x faster** across all test cases

**Key Insights:**
- 🚀 = fast_json_repair is faster (all test cases)
- **Invalid JSON repair**: 5-110x faster
- **Valid JSON with ensure_ascii=False**: 2-3x faster (uses orjson fast path)
- **Valid JSON with ensure_ascii=True**: 1.1-1.3x faster
- **Best performance gains**: Long strings (110x), large objects (68x), very large arrays (20x)

### Performance Advantages

- **Large JSON documents**: 10-70x faster for documents with many keys/values
- **Long strings**: Up to 110x faster for documents with large string values
- **Very large arrays**: 20x faster for arrays with thousands of elements
- **Deeply nested structures**: 7-10x faster with consistent performance
- **Memory efficiency**: Lower memory footprint due to Rust's zero-cost abstractions and optimized allocations

Run `python benchmark.py` to test performance on your system. See [PERFORMANCE.md](PERFORMANCE.md) for detailed analysis.

## AWS Deployment

Works seamlessly on AWS with pre-built wheels for all architectures:
- **x86_64** - Standard EC2 instances (t2, t3, m5, c5, etc.)
- **ARM64/aarch64** - Graviton instances (t4g, m6g, c6g, etc.)

```bash
# Install on any AWS instance - pip auto-selects the correct wheel
pip install fast-json-repair
```

For Lambda layers and cross-compilation, see [DEPLOYMENT.md](DEPLOYMENT.md).

## Development

### Quick Reference

| Task | Command | VS Code Task |
|------|---------|--------------|
| **Setup** | `./setup.sh` | - |
| **Build (debug)** | `maturin develop` | 🔧 Build: Development |
| **Build (release)** | `maturin develop --release` | 🚀 Build: Development (Release) |
| **Run tests** | `pytest tests/ -v` | 🧪 Test: Python (All) |
| **Run benchmarks** | `python benchmark.py` | ⚡ Benchmark: Run Full Suite |
| **Format code** | `cargo fmt && black . && isort .` | ✨ Format: All (Rust + Python) |
| **Lint Rust** | `cargo clippy` | 🦀 Rust: Clippy |
| **Lint Python** | `ruff check .` | 🐍 Python: Lint (Ruff) |
| **Full check** | `maturin develop && pytest && python benchmark.py` | ✅ Full Check: Build + Test + Benchmark |

### Quick Setup

```bash
# Automated setup (recommended)
./setup.sh

# Or manually with uv
uv venv && source .venv/bin/activate
uv sync
maturin develop
```

### VS Code Integration

This project includes a complete VS Code workspace configuration:

**Getting Started:**
1. Open the project folder in VS Code
2. Install recommended extensions (you'll see a prompt)
3. The Python interpreter will auto-detect `.venv`
4. Press `Cmd+Shift+P` → "Tasks: Run Task" to see all available commands

**Available Tasks:**
- 🔧 **Build Tasks**: Debug build, release build, wheels, cross-platform builds
- 🧪 **Test Tasks**: Run all tests, quick tests, coverage reports
- ⚡ **Benchmark Tasks**: Full benchmarks, quick benchmarks, save results
- 🦀 **Rust Tasks**: Check, clippy, format, clean
- 🐍 **Python Tasks**: Format (black), sort imports (isort), lint (ruff)
- 🚢 **Workflows**: Full check (build+test+benchmark), release prep, quality checks

**Debugging:**
- Press `F5` to debug Python tests
- Set breakpoints in Python code
- Use "Debug: Select and Start Debugging" for specific configs

### Common Commands

See the Quick Reference table above for the most common tasks. Additional commands:

```bash
# Code quality
black .                # Format Python code
isort .                # Sort Python imports
ruff check .           # Lint Python code

# Cross-platform builds (requires zig)
maturin build --release --target x86_64-unknown-linux-gnu --zig
maturin build --release --target aarch64-unknown-linux-gnu --zig
maturin build --release --target universal2-apple-darwin
```

### Project Structure

```
fast_json_repair/
├── src/
│   └── lib.rs              # Rust implementation (core repair logic)
├── python/
│   └── fast_json_repair/
│       └── __init__.py     # Python API wrapper
├── tests/
│   └── test_all.py         # Python test suite
├── benchmark.py            # Performance benchmarks
├── pyproject.toml          # Python package configuration
├── Cargo.toml              # Rust package configuration
└── .vscode/                # VS Code workspace settings (local)
    ├── settings.json       # Python/Rust interpreter & formatting
    ├── tasks.json          # Build/test/benchmark tasks
    ├── launch.json         # Debug configurations
    └── extensions.json     # Recommended extensions
```

### Typical Workflow

1. **Make Changes** - Edit Rust (`src/`) or Python (`python/`) code
2. **Rebuild** - `maturin develop` or VS Code task `🔧 Build: Development`
3. **Test** - `pytest tests/ -v` or VS Code task `🧪 Test: Python (All)`
4. **Benchmark** - `python benchmark.py` or VS Code task `⚡ Benchmark: Run Full Suite`
5. **Release** - `maturin build --release` when ready to publish

## License

MIT License (same as original json_repair)

## Credits & Acknowledgments

### Original Author
- **[Stefano Baccianella](https://github.com/mangiucugna)** - Creator of the original [json_repair](https://github.com/mangiucugna/json_repair) library
  - Original concept and algorithm design
  - Python implementation that this library is based on
  - Comprehensive test cases and edge case handling

### This Rust Port
- Performance optimization through Rust implementation
- Maintains full API compatibility with the original
- Uses [PyO3](https://pyo3.rs/) for Python bindings
- Uses [orjson](https://github.com/ijl/orjson) for fast JSON parsing

### Special Thanks
A huge thank you to Stefano Baccianella for creating json_repair and making it open source. This library wouldn't exist without the original brilliant implementation that has helped countless developers handle malformed JSON from LLMs.

If you appreciate this performance-focused port, please also show support for the [original json_repair project](https://github.com/mangiucugna/json_repair) that made it all possible.

