Metadata-Version: 2.4
Name: pii-core
Version: 0.1.0
Summary: Multi-language PII detection with regex and checksum validation. Pure Python, zero runtime dependencies.
Project-URL: Homepage, https://github.com/pii-toolkit/pii-core
Project-URL: Repository, https://github.com/pii-toolkit/pii-core
Project-URL: Issues, https://github.com/pii-toolkit/pii-core/issues
Author-email: Michal Piotrowski <piotrowskimichalwfis@gmail.com>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: anonymization,checksum,gdpr,pii,privacy
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Typing :: Typed
Requires-Python: >=3.10
Provides-Extra: dev
Requires-Dist: hypothesis>=6.100; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.6; extra == 'dev'
Description-Content-Type: text/markdown

# pii-core

Multi-language PII detection with regex and checksum validation. Pure Python, zero runtime dependencies.

The foundation library that [`pii-veil`](https://github.com/pii-toolkit/pii-veil) (reversible anonymization) and [`pii-presidio`](https://github.com/pii-toolkit/pii-presidio) (Microsoft Presidio plugin) build on.

## Install

```bash
pip install pii-core
```

## What's in v0.1.0

**Polish identifiers** with regex + checksum validation:

- PESEL (national ID), NIP (tax ID), REGON (business registry — 9 and 14 digit) — weighted-sum checksums.
- Polish IBAN (`PL` prefix, mod-97 validated).
- Polish ID card (dowód osobisty), passport — regex only (official checksums not yet implemented).
- Polish mobile phone (`+48`, optional separators).
- Opt-in: KRS court register number (10 digits, no checksum), postal code (`XX-XXX`). Excluded from `DEFAULT_DETECTORS` because their raw patterns collide with ordinary text — pair them with a context-word filter.

**Cross-language detectors:**

- Email addresses (practical subset, not strict RFC 5322).
- Credit-card numbers (Luhn-validated; bare, dashed, and spaced shapes for Visa / MC / Amex / Discover / Diners).

**Multi-country IBAN validator** (`is_valid_iban`) covering ~80 countries via the published SWIFT registry. The `PlIbanDetector` regex is Polish-only, but the checksum function is general.

## Quick usage

```python
from pii_core import DEFAULT_DETECTORS

text = "Mój PESEL: 44051401358, kontakt: jan@example.pl."

for detector in DEFAULT_DETECTORS:
    for match in detector.detect(text):
        print(f"{match.detector}: {match.value!r} at {match.start}-{match.end}")
```

```python
from pii_core import is_valid_pesel, is_valid_iban, is_valid_luhn

is_valid_pesel("44051401358")            # True
is_valid_iban("DE89370400440532013000")  # True
is_valid_luhn("4111111111111111")        # True
```

Opt-in detectors (KRS, postal code) live in `pii_core.pl`:

```python
from pii_core import DEFAULT_DETECTORS
from pii_core.pl import PlKrsDetector, PlPostalCodeDetector

# Add them only when you have context-word filtering elsewhere in your pipeline.
my_detectors = [*DEFAULT_DETECTORS, PlKrsDetector(), PlPostalCodeDetector()]
```

## API stability

`PIIType` value strings, detector `.name` strings, and the order of `DEFAULT_DETECTORS` are SemVer-stable: downstream consumers persist them in serialized mappings or use them as overlap-resolution priority keys. Internal changes (regex tweaks, helper renames) can churn within minor versions.

## Sibling packages

- [`pii-veil`](https://github.com/pii-toolkit/pii-veil) — reversible anonymization with persisted mapping and CLI, built on `pii-core`.
- [`pii-presidio`](https://github.com/pii-toolkit/pii-presidio) — Microsoft Presidio plugin wrapping `pii-core` recognizers with optional reversible anonymization.

## License

Apache-2.0. See `LICENSE`.
