Metadata-Version: 2.4
Name: ghost-pii-pydantic
Version: 0.1.3
Summary: Automatic PII redaction for Pydantic v2 — masks sensitive data in logs and print statements. GDPR/HIPAA-friendly.
Project-URL: Homepage, https://github.com/STHITAPRAJNAS/ghost-pii-pydantic
Project-URL: Repository, https://github.com/STHITAPRAJNAS/ghost-pii-pydantic
Project-URL: Documentation, https://github.com/STHITAPRAJNAS/ghost-pii-pydantic#readme
Project-URL: Issues, https://github.com/STHITAPRAJNAS/ghost-pii-pydantic/issues
Author-email: Sthitaprajna Sahoo <papu.sahoo@gmail.com>
License: Apache-2.0
License-File: LICENSE
Keywords: anonymisation,compliance,data-privacy,data-protection,gdpr,hipaa,logging,masking,pii,privacy,pydantic,pydantic-v2,redaction,security
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: Log Analysis
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: pydantic>=2.0.0
Provides-Extra: dev
Requires-Dist: email-validator>=2.0.0; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# GhostPII 👻

**Automatic PII redaction for Pydantic v2 — zero-config, GDPR/HIPAA-friendly.**

[![PyPI version](https://img.shields.io/pypi/v/ghost-pii-pydantic.svg)](https://pypi.org/project/ghost-pii-pydantic/)
[![Python](https://img.shields.io/pypi/pyversions/ghost-pii-pydantic.svg)](https://pypi.org/project/ghost-pii-pydantic/)
[![CI](https://github.com/STHITAPRAJNAS/ghost-pii-pydantic/actions/workflows/ci.yml/badge.svg)](https://github.com/STHITAPRAJNAS/ghost-pii-pydantic/actions/workflows/ci.yml)
[![License](https://img.shields.io/badge/license-Apache%202.0-green)](LICENSE)
[![Typed](https://img.shields.io/badge/typing-py.typed-informational)](src/ghost_pii/py.typed)

> **Note:** This project is published on PyPI as [`ghost-pii-pydantic`](https://pypi.org/project/ghost-pii-pydantic/).

GhostPII solves the **"Logged Secret"** problem: sensitive fields (emails, SSNs, credit card numbers, API keys) leaking into logs and tracebacks. It provides a smart string proxy that automatically redacts itself in unsafe contexts (`logging`, `print`, tracebacks) while remaining fully functional for business logic, databases, and APIs.

- Drop-in Pydantic v2 `Annotated` type — no middleware, no post-processing
- Tainted memory propagation — concatenated strings stay redacted
- Strict mode for FinTech / HealthTech / high-compliance environments
- Works with sync and async Python services


## Features

| Feature | Description |
|---------|-------------|
| **Auto-Magical Redaction** | Automatically detects `print()` and `logging` calls to mask PII. |
| **Pydantic Native** | First-class support for Pydantic v2 `Annotated` types. |
| **Strict Mode** | Opt-in for 100% redaction everywhere unless explicitly unmasked. |
| **Tainted Memory** | Operations on PII (like concatenation) stay PII. No accidental leaks. |
| **Context Aware** | Use `unmask_pii()` context manager for explicit, safe data access. |
| **Zero-Performance-Cost** | Optimized stack inspection with fast-fail logic. |

## Installation

```bash
pip install ghost-pii-pydantic
```

## Quick Start

```python
from pydantic import BaseModel, EmailStr
from ghost_pii import PII, unmask_pii

class User(BaseModel):
    name: PII[str]
    email: PII[EmailStr] # Validates as email (via Pydantic), redacts in logs

user = User(name="John Doe", email="john@example.com")

# 1. Safe by Default: Redacts in logs/prints
print(user)
# Output: name=GhostString('[REDACTED]') email=GhostString('[REDACTED]')

# 2. Functional: Works in business logic/DBs
# (Internal calls to user.email return the real string)
db.execute("INSERT INTO users VALUES (?)", [user.email])
# Successfully inserts "john@example.com"

# 3. Explicit: Use context manager for sensitive tasks
with unmask_pii():
    print(user) 
    # Output: name=GhostString('John Doe') email=GhostString('john@example.com')
```

## Advanced Scenarios

### Nested Models and Collections
GhostPII seamlessly handles nested Pydantic models and lists of PII.

```python
from typing import List
from ghost_pii import PII

class Address(BaseModel):
    street: PII[str]
    city: str

class Organization(BaseModel):
    name: str
    admin_emails: List[PII[EmailStr]]
    headquarters: Address

org = Organization(
    name="Acme Corp",
    admin_emails=["admin@acme.com", "sec@acme.com"],
    headquarters=Address(street="123 Secret Lane", city="New York")
)

print(org.model_dump())
# Output: {
#   'name': 'Acme Corp', 
#   'admin_emails': ['[REDACTED]', '[REDACTED]'], 
#   'headquarters': {'street': '[REDACTED]', 'city': 'New York'}
# }
```

### Tainted Memory (Concatenation)
PII "infects" any string it touches. If you combine a PII field with a normal string, the result is a new `GhostString` that is also redacted by default.

```python
labeled_name = "User: " + user.name
print(labeled_name) # Output: [REDACTED]

with unmask_pii():
    print(labeled_name) # Output: User: John Doe
```

## Async Support

GhostPII works transparently in async services. The `unmask_pii()` context manager is sync-safe and can be used inside `async` functions:

```python
import asyncio
from ghost_pii import PII, unmask_pii

class UserEvent(BaseModel):
    user_id: str
    email: PII[str]

async def send_confirmation(event: UserEvent):
    # Logging is safe — email is auto-redacted
    logger.info("Sending confirmation to %s", event.email)

    with unmask_pii():
        await smtp_client.send(to=str(event.email), subject="Confirm your account")
```

## Enterprise Strategy

GhostPII is designed to adapt to different compliance levels:

| Mode | Recommended For | Mechanism |
|------|-----------------|-----------|
| **Auto-Magical** | General microservices, high developer velocity. | Uses stack inspection to detect `logging`, `print`, etc. |
| **Strict Mode** | FinTech, HealthTech, High-Compliance environments. | Redacts **everywhere**. Requires explicit `unmask_pii()` to access data. |

### Enabling Strict Mode
```python
from ghost_pii import set_strict_mode

set_strict_mode(True) # Best practice for production PII handling
```

## Why GhostPII vs Alternatives

| | GhostPII | [presidio](https://github.com/microsoft/presidio) | [scrubadub](https://github.com/LeapBeyond/scrubadub) | Manual field redaction |
|---|---|---|---|---|
| **Integration model** | Pydantic `Annotated` type | NLP pipeline / scrubber | String scrubber | Ad-hoc |
| **Auto-redacts in logs** | Yes — zero config | No | No | No |
| **Preserves value for DB/API** | Yes | No (destructive) | No (destructive) | Depends |
| **Tainted memory propagation** | Yes | No | No | No |
| **Strict / audit mode** | Yes | No | No | Manual |
| **Setup overhead** | `pip install` + type annotation | NER models, language packs | Pattern config | High |
| **Best for** | Pydantic services, FastAPI, microservices | Bulk text anonymisation | Legacy string scrubbing | Simple one-off cases |

**TL;DR:** presidio and scrubadub are great for scrubbing free-text blobs. GhostPII is purpose-built for Pydantic models where you need the real value to flow through your app but never appear in logs.

## Contributing

We follow strict engineering standards. Please ensure you run linters and tests before submitting PRs.

```bash
pip install -e ".[dev]"
pytest                        # run test suite
ruff check src/ghost_pii      # lint
mypy src/ghost_pii            # type-check
```

## License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

Copyright (c) 2026 Sthitaprajna Sahoo and contributors.
