Metadata-Version: 2.4
Name: pii-veil
Version: 0.1.0
Summary: Reversible PII anonymization for LLM workflows. Round-trip with persisted mapping; CLI included.
Project-URL: Homepage, https://github.com/pii-toolkit/pii-veil
Project-URL: Repository, https://github.com/pii-toolkit/pii-veil
Project-URL: Issues, https://github.com/pii-toolkit/pii-veil/issues
Author-email: Michal Piotrowski <piotrowskimichalwfis@gmail.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: anonymization,gdpr,llm,pii,privacy,reversible,tokenization
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: pii-core<0.2,>=0.1
Requires-Dist: typer>=0.12
Provides-Extra: dev
Requires-Dist: hypothesis>=6.100; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Description-Content-Type: text/markdown

# pii-veil

Reversible PII anonymization for LLM workflows. Replace PII with stable tokens, send to an LLM, then deanonymize the response using the persisted mapping.

Built on [`pii-core`](https://github.com/pii-toolkit/pii-core) for detection. Detector-agnostic: any `pii_core.Detector` plugs in.

## Install

```bash
pip install pii-veil
```

## Quick usage

```python
from pii_veil import Shield

shield = Shield()
result = shield.anonymize("Mój PESEL: 44051401358, kontakt: jan@example.pl.")
# result.text   -> "Mój PESEL: [PL_PESEL_001], kontakt: [EMAIL_001]."
# result.mapping persists the reversible mapping

# ... send result.text to an LLM, get a response back ...
restored = shield.deanonymize(llm_response)
```

The same value gets the same token within a `Shield`'s lifetime, so an LLM that quotes a token back gets resolved to the original. Persist the mapping JSON if you need round-trips across processes:

```python
mapping_json = result.mapping.to_json()
# later, in a different process:
from pii_veil import Mapping, Shield
loaded = Shield(mapping=Mapping.from_json(mapping_json))
loaded.deanonymize(text_from_llm)
```

## CLI

```bash
pii-veil anonymize input.txt -o anon.txt -m mapping.json
pii-veil deanonymize anon.txt -m mapping.json -o restored.txt
pii-veil detect input.txt --format json
```

`-` as the input path means stdin. `deanonymize -o -` (or omitting `-o`) writes to stdout. UTF-8 (with or without BOM) and UTF-16 (with BOM) are accepted on read; output is always UTF-8 without BOM.

## Custom detectors

```python
from pii_core import PlPeselDetector, EmailDetector
from pii_veil import Shield

# Only PESEL and email; everything else passes through.
shield = Shield(detectors=[PlPeselDetector(), EmailDetector()])
```

Detector order is the overlap-resolution priority tiebreak: when two detectors emit identical spans, the one earlier in the list wins. Different lengths are resolved by "longest match wins".

## Hardening for untrusted input

```python
shield = Shield(max_input_bytes=1_000_000)  # 1 MiB cap; raises InputSizeError above
shield.reset()  # clear accumulated mapping between unrelated documents
```

`Shield.anonymize` is O(n) in input size and not thread-safe; use one Shield per request, and `reset()` between unrelated documents to prevent token-shape collisions across users.

## API stability

The public surface (`Shield`, `Mapping`, `AnonymizeResult`, `Match`, `PIIType`, the four exception classes) is SemVer-stable. Mapping JSON has a `schema_version` field; the loader rejects unknown versions rather than guessing.

## Sibling packages

- [`pii-core`](https://github.com/pii-toolkit/pii-core) — multi-language detection primitives.
- [`pii-presidio`](https://github.com/pii-toolkit/pii-presidio) — Microsoft Presidio plugin with its own optional reversible operator.

## License

Apache-2.0. See `LICENSE`.
