Metadata-Version: 2.4
Name: elemental-xenon
Version: 1.1.0
Summary: Secure-by-default XML repair library for LLM-generated XML with trust levels
Project-URL: Homepage, https://github.com/MarsZDF/xenon
Project-URL: Documentation, https://github.com/MarsZDF/xenon
Project-URL: Repository, https://github.com/MarsZDF/xenon
Project-URL: Bug Tracker, https://github.com/MarsZDF/xenon/issues
Author: Marco Zaccaria Di Fraia, Xenon Contributors
License-Expression: MIT
License-File: LICENSE
Keywords: ai,ai-agents,async,audit,generative-ai,hallucination-repair,llm,malformed,nlp,parser,rag,repair,security,streaming,trust-levels,xml
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Markup :: XML
Classifier: Typing :: Typed
Requires-Python: >=3.8
Provides-Extra: dev
Requires-Dist: bandit>=1.7.0; extra == 'dev'
Requires-Dist: commitizen>=3.0; extra == 'dev'
Requires-Dist: hypothesis>=6.0; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pre-commit>=3.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest-benchmark>=4.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: python-semantic-release>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: types-lxml>=2024.0.0; extra == 'dev'
Provides-Extra: lxml
Requires-Dist: lxml>=4.9.0; extra == 'lxml'
Description-Content-Type: text/markdown

# Xenon

[![CI](https://github.com/MarsZDF/xenon/actions/workflows/ci.yml/badge.svg)](https://github.com/MarsZDF/xenon/actions/workflows/ci.yml)
[![PyPI version](https://badge.fury.io/py/elemental-xenon.svg)](https://badge.fury.io/py/elemental-xenon)

**Xenon** is a robust, zero-dependency Python library designed to clean up, repair, and secure malformed XML generated by Large Language Models (LLMs).

In the era of RAG and AI agents, applications increasingly rely on structured data outputs. However, LLMs often generate messy XML—missing closing tags, conversational fluff, hallucinations, or XSS vulnerabilities. **Xenon** bridges the gap between raw LLM output and reliable application code.

## 🚀 Key Features

- **LLM-Focused Repair**: specifically handles truncation, hallucinations, and conversational text ("Here is your XML: ...").
- **Secure by Default**: Explicit `TrustLevel` system to prevent XSS and injection attacks from untrusted sources.
- **Real-Time Streaming**: Repair XML token-by-token as it arrives from the LLM (compatible with OpenAI/Anthropic streams).
- **Zero Runtime Dependencies**: Core functionality uses only the Python Standard Library. (Optional `lxml` support for schema validation).
- **Smart Matching**: Uses Levenshtein distance to fix typoed tags (e.g., `</usre>` → `</user>`).
- **Formatting & Diffs**: Built-in pretty-printing and diff generation to visualize repairs.

## 🤝 Complementary to Standard Parsers

Xenon is designed to work **alongside** established XML parsers like `lxml` or `BeautifulSoup`, not replace them.

Standard parsers expect well-formed input and will rightfully raise errors on the chaotic output often generated by LLMs (hallucinations, cut-off strings, missing tags). **Xenon acts as a stabilization layer**: it accepts raw, potentially malformed LLM output and produces a valid, secure XML string that standard parsers can then consume safely and reliably.

Think of Xenon as the **pre-processor** that ensures your data pipeline remains robust, even when the LLM output isn't.

## Installation

```bash
pip install elemental-xenon
```

For development:
```bash
git clone https://github.com/MarsZDF/xenon.git
cd xenon
pip install -e ".[dev]"
```

## ⚡ Quick Start

Xenon requires you to specify a **Trust Level** for your input. This ensures you don't accidentally expose your application to security threats from untrusted LLM outputs.

```python
from xenon import repair_xml_safe, parse_xml_safe, TrustLevel

# 1. Repair malformed LLM output
llm_output = 'Sure! <root><user name=john>Hello'
repaired = repair_xml_safe(llm_output, trust=TrustLevel.UNTRUSTED)
print(repaired)
# Output: <root><user name="john">Hello</user></root>

# 2. Parse directly to a dictionary
data = parse_xml_safe(llm_output, trust=TrustLevel.UNTRUSTED)
print(data)
# Output: {'root': {'user': {'@attributes': {'name': 'john'}, '#text': 'Hello'}}}
```

### 📚 API Reference

| Function | Description | Key Arguments |
|----------|-------------|---------------|
| `repair_xml_safe` | Core function. Returns repaired XML string. | `xml_input`, `trust`, `format_output` |
| `parse_xml_safe` | Repairs and converts to Python dict. | `xml_input`, `trust` |
| `StreamingXMLRepair` | Context manager for streaming processing. | `trust` |
| `format_xml` | Utility to pretty-print or minify XML. | `xml_string`, `style` |

### Trust Levels

| Level | Use Case | Security Features |
|-------|----------|-------------------|
| `UNTRUSTED` | LLM output, user uploads | 🔒 Strict escaping, strips dangerous tags (script/iframe), prevents XXE & DoS. |
| `INTERNAL` | Internal microservices | 🔐 Balanced protection, higher depth limits. |
| `TRUSTED` | Hardcoded strings, tests | ⚡ No overhead, fastest performance. |

## 🌊 Real-Time Streaming

For RAG pipelines and chat interfaces, you can repair XML **as it is being generated**, reducing latency to near zero. Xenon handles the chunking logic for you.

```python
from xenon.streaming import StreamingXMLRepair
from xenon import TrustLevel

# Works with any iterator (sync or async)
def llm_stream():
    yield "Here is the data:\n<use"
    yield "r id=1>Al"
    yield "ice</user>"

with StreamingXMLRepair(trust=TrustLevel.UNTRUSTED) as repairer:
    for chunk in llm_stream():
        # Yields safe, valid XML fragments immediately
        for safe_fragment in repairer.feed(chunk):
            print(safe_fragment, end="")

# Output: <user id="1">Alice</user>
```

## 🔗 Integrations

### LangChain
Xenon provides a drop-in `XenonXMLOutputParser` for LangChain pipelines.

```python
from xenon.integrations.langchain import XenonXMLOutputParser
from xenon import TrustLevel

# Create the parser
parser = XenonXMLOutputParser(
    trust=TrustLevel.UNTRUSTED,
    return_dict=True  # Returns dict, set False for string
)

# Use in your chain
chain = prompt | llm | parser
result = chain.invoke({"query": "Generate user XML"})
```

## 🛠️ Common Repair Scenarios

Xenon automatically handles the most common LLM failure modes:

**1. Truncation / Cut-off**
```python
# Input:  <root><list><item>Item 1
# Output: <root><list><item>Item 1</item></list></root>
```

**2. Conversational Fluff**
```python
# Input:  Here is the XML: <data>value</data> Hope that helps!
# Output: <data>value</data>
```

**3. Malformed Attributes**
```python
# Input:  <user name=john age=25>
# Output: <user name="john" age="25"></user>
```

**4. Unescaped Entities**
```python
# Input:  <text>Barnes & Noble</text>
# Output: <text>Barnes &amp; Noble</text>
```

**5. Hallucinated/Invalid Tags**
```python
# Input:  <123tag>data</123tag>
# Config: sanitize_invalid_tags=True
# Output: <tag_123tag>data</tag_123tag>
```

## 📊 Analysis & Formatting

### Diff Reporting
See exactly what the repair engine changed.

```python
from xenon import repair_xml_with_report, TrustLevel

xml = "<root><item>test"
repaired, report = repair_xml_with_report(xml, trust=TrustLevel.UNTRUSTED)

print(report.summary())
# Performed 1 repair(s):
#   - [truncation] Added closing tags for: item, root
```

### Formatting
Pretty-print or minify your XML.

```python
from xenon import format_xml

xml = "<root><item>val</item></root>"
print(format_xml(xml, style="pretty"))
# <root>
#   <item>val</item>
# </root>
```

## Advanced Configuration

For specific needs, you can override security or repair settings individually:

```python
from xenon import repair_xml_safe, TrustLevel

repaired = repair_xml_safe(
    "<root><script>alert(1)</script></root>",
    trust=TrustLevel.UNTRUSTED,
    # Overrides:
    strip_dangerous_tags=False,  # Keep <script> tags (Use with caution!)
    format_output="pretty",      # Auto-format result
    schema_content=my_xsd_schema # Validate against XSD schema
)
```

### Audit Logging

For enterprise use cases requiring traceability (SOC 2, ISO 27001), Xenon can log security events:

```python
from xenon.audit import AuditLogger

# Configure logger
logger = AuditLogger()

# Usage
repair_xml_safe(
    untrusted_input,
    trust=TrustLevel.UNTRUSTED,
    audit_logger=logger
)

# Export logs
logs = logger.to_json()
# [
#   {
#     "timestamp": "2023-10-27T...",
#     "threats_detected": ["dangerous_pi"],
#     "actions_taken": ["DANGEROUS_PI_STRIPPED: Removed..."],
#     ...
#   }
# ]
```

## Interactive Demo

Try Xenon in your browser with our Google Colab notebook:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/MarsZDF/xenon/blob/main/xenon_demo.ipynb)

## License

MIT License. See [LICENSE](LICENSE) for details.