Metadata-Version: 2.4
Name: llm-guardrails-kit
Version: 0.1.0
Summary: A compliance control layer between your app and any LLM: PII/PCI redaction, cost tracking, and privacy-safe audit logging. GCC-focused.
Project-URL: Homepage, https://github.com/GBMUAE/llm-guardrails-kit
Project-URL: Repository, https://github.com/GBMUAE/llm-guardrails-kit
Project-URL: Issues, https://github.com/GBMUAE/llm-guardrails-kit/issues
Project-URL: Changelog, https://github.com/GBMUAE/llm-guardrails-kit/blob/main/CHANGELOG.md
Author-email: Hasan Odeh <hodeh@gbm.net>
Maintainer-email: Hasan Odeh <hodeh@gbm.net>
License: MIT
License-File: LICENSE
Keywords: audit,compliance,cost-tracking,gcc,guardrails,llm,pci,pii,redaction,uae
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Financial and Insurance Industry
Classifier: Intended Audience :: Healthcare Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Provides-Extra: litellm
Requires-Dist: litellm>=1.0; extra == 'litellm'
Description-Content-Type: text/markdown

# llm-guardrails-kit

**A compliance control layer that sits between your application and any LLM API — redacting PII/PCI, tracking cost, and writing privacy-safe audit logs.**

[![PyPI](https://img.shields.io/pypi/v/llm-guardrails-kit.svg)](https://pypi.org/project/llm-guardrails-kit/)
[![Python versions](https://img.shields.io/pypi/pyversions/llm-guardrails-kit.svg)](https://pypi.org/project/llm-guardrails-kit/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![CI](https://github.com/GBMUAE/llm-guardrails-kit/actions/workflows/ci.yml/badge.svg)](https://github.com/GBMUAE/llm-guardrails-kit/actions/workflows/ci.yml)

Built by **Gulf Business Machines (GBM)** with a focus on regulated GCC sectors — finance, healthcare, and government.

---

## Why this exists

Sending raw customer data — names, card numbers, Emirates IDs, IBANs — to a
third-party LLM provider is a **data-protection and compliance risk** for
organizations in regulated sectors. Once data leaves your perimeter you lose
control of where it is stored, logged, or used for training.

`llm-guardrails-kit` gives you a thin, auditable **control layer** you own:

- 🛡️ **Redaction** — detect and mask PII/PCI *before* the prompt leaves your
  network, and transparently restore the originals in the response.
- 💰 **Cost governance** — meter every request and roll up spend per model.
- 📝 **Auditability** — append a structured, privacy-safe JSON-Lines record for
  every call, proving *what* was sent without ever recording the raw values.

It is **provider-agnostic** (works with any LLM via a callable you supply) and
the core install has **zero third-party dependencies** — easy to vet, easy to
run in locked-down environments.

> Data-protection regimes across the GCC (for example the UAE PDPL, DIFC and ADGM
> data-protection regulations, Saudi PDPL, and sector rules from regulators such
> as central banks and health authorities) place obligations on how personal
> data is processed and shared. This library is a **control to help you meet
> those obligations** — see the [disclaimer](#disclaimer).

---

## Install

```bash
# Core — zero third-party dependencies
pip install llm-guardrails-kit

# Optional: LiteLLM integration (one client for 100+ providers)
pip install "llm-guardrails-kit[litellm]"

# Development (pytest, build, ruff)
pip install "llm-guardrails-kit[dev]"
```

Requires Python 3.11+.

---

## Quickstart

Redact a prompt containing a card number, Emirates ID, and email; send it
through a `GuardedClient` with a dummy completion function; and inspect the
audit record — all without an API key.

```python
from llm_guardrails_kit import GuardedClient, AuditLogger, CostTracker

# 1. A stand-in for your real LLM call. It receives the ALREADY-REDACTED prompt.
#    Swap this for OpenAI, Anthropic, LiteLLM, Bedrock — anything.
def my_llm(prompt: str, model: str, **kwargs) -> dict:
    # A real LLM never sees the raw PII — only placeholders like [CARD_1].
    print("LLM received:", prompt)
    return {
        "text": "Thanks! I've noted the details for [EMAIL_1].",
        "input_tokens": 42,
        "output_tokens": 12,
    }

# 2. Wire up the control layer.
audit = AuditLogger(path="audit.jsonl")
client = GuardedClient(my_llm, cost_tracker=CostTracker(), audit_logger=audit)

# 3. Call it with sensitive data.
result = client.complete(
    "Charge card 5555 5555 5555 4444 for Emirates ID 784-1984-1234567-4 "
    "and email the receipt to jane.doe@example.com",
    model="gpt-4o",
)

print("Returned to app:", result.text)
print("Redaction counts:", result.redaction_counts)
print("Cost (USD):", result.cost.total_cost)
audit.close()
```

Output (abridged):

```
LLM received: Charge card [CARD_1] for Emirates ID [EMIRATES_ID_1] and email the receipt to [EMAIL_1]
Returned to app: Thanks! I've noted the details for jane.doe@example.com.
Redaction counts: {'credit_card': 1, 'emirates_id': 1, 'email': 1}
Cost (USD): 0.000225
```

The `audit.jsonl` record — **note there is no raw PII, only counts**:

```json
{"input_tokens": 42, "latency_ms": 0.05, "metadata": {}, "model": "gpt-4o", "output_tokens": 12, "redaction_counts": {"credit_card": 1, "email": 1, "emirates_id": 1}, "request_id": "d3b0...", "timestamp": "2026-07-03T09:15:22.104531Z", "total_cost": 0.000225}
```

### Redaction on its own

```python
from llm_guardrails_kit import Redactor

r = Redactor()
res = r.redact("Wire to AE07 0331 2345 6789 0123 456, call +971 50 123 4567")
print(res.redacted_text)  # Wire to [IBAN_1], call [PHONE_1]
print(res.counts)         # {'iban': 1, 'phone': 1}

# Fully reversible for the caller:
print(r.restore("Confirmed [IBAN_1]", res.mapping))  # Confirmed AE07 0331 2345 6789 0123 456
```

### Partial masking (show only the last 4 of a card)

```python
from llm_guardrails_kit import Redactor, RedactionPolicy, MaskStyle

policy = RedactionPolicy(mask_style=MaskStyle.PARTIAL)
print(Redactor(policy).redact("Card 5555 5555 5555 4444").redacted_text)
# Card **** **** **** 4444
```

### Using it with LiteLLM (optional extra)

```python
from llm_guardrails_kit import GuardedClient, AuditLogger

client = GuardedClient.from_litellm(audit_logger=AuditLogger(path="audit.jsonl"))
result = client.complete("Email jane.doe@example.com", model="gpt-4o")
```

---

## What gets detected

| Detector       | Placeholder        | Validation                                   |
| -------------- | ------------------ | -------------------------------------------- |
| Email          | `[EMAIL_n]`        | RFC-ish pattern                              |
| Phone (intl + UAE/GCC) | `[PHONE_n]` | E.164 digit-count (7–15)                     |
| Credit card    | `[CARD_n]`         | **Luhn checksum** + length 13–19             |
| Generic PCI    | `[PCI_n]`          | 4×4 card-shaped grouping (defensive)         |
| IBAN           | `[IBAN_n]`         | **ISO 13616 mod-97** + per-country length    |
| Emirates ID    | `[EMIRATES_ID_n]`  | `784-YYYY-NNNNNNN-C` format + **Luhn**       |
| IPv4 address   | `[IP_n]`           | Octet range 0–255                            |
| Custom         | `[NAME_n]`         | Your regex                                   |

Every numeric detector uses **real structural validation**, not just a regex, so
`1234 5678 9012 3456` is *not* mistaken for a valid card, and a corrupted IBAN
is rejected by its checksum.

---

## API overview

| Symbol | Purpose |
| ------ | ------- |
| `Redactor(policy=None)` | `.redact(text) -> RedactionResult`; `.restore(text, mapping) -> str` |
| `RedactionPolicy` | Toggle detectors, add `custom_patterns`, pick `mask_style` |
| `RedactionResult` | `.redacted_text`, `.mapping`, `.counts`, `.total_redactions` |
| `MaskStyle` | `PLACEHOLDER` (default, reversible), `PARTIAL` (last 4), `FULL` |
| `CostTracker(prices=None, default_price=None)` | `.record(model, in, out)`, `.total()`, `.reset()`, `.set_price()` |
| `DEFAULT_PRICES` | Editable per-1M-token price table (verify before production!) |
| `AuditLogger(path=None, sink=None)` | `.log(...) -> AuditRecord`; JSON-Lines, never logs raw PII |
| `GuardedClient(completion_fn, ...)` | `.complete(prompt, model)`; `.from_litellm(...)` optional helper |
| `luhn_valid`, `iban_valid`, `emirates_id_valid` | Standalone validators |

---

## Cost prices — verify before you trust them

`DEFAULT_PRICES` in [`cost.py`](src/llm_guardrails_kit/cost.py) ships a small,
**illustrative** table of per-1M-token prices for a few common OpenAI and
Anthropic models. **Prices change frequently** — always confirm against your
provider's current pricing and override:

```python
from llm_guardrails_kit import CostTracker, ModelPrice

tracker = CostTracker()
tracker.set_price("gpt-4o", ModelPrice(input_per_1m=2.5, output_per_1m=10.0))
```

---

## Development

```bash
pip install -e ".[dev]"
ruff check .
pytest
python -m build
```

---

## Disclaimer

`llm-guardrails-kit` is a **helper control layer**, not a guarantee of
regulatory compliance. It reduces the risk of leaking sensitive data to
third-party LLMs, but:

- No automated redactor detects 100% of sensitive data in free-form text.
- It does **not** claim any specific certification or regulatory approval.
- You remain responsible for validating it against your own legal, regulatory,
  and contractual obligations before relying on it in production.

Test it thoroughly with your own data and threat model. Use of this software is
subject to the [MIT License](LICENSE) and provided "as is", without warranty.

---

*Built by [Gulf Business Machines (GBM)](https://www.gbmme.com). Maintainer: Hasan Odeh.*
