Metadata-Version: 2.4
Name: pii-aware-mixin
Version: 0.2.0
Summary: Metadata-driven sensitive-data masking mixin for Python frozen dataclasses
Project-URL: Homepage, https://github.com/jekhator/pii-aware-mixin
Project-URL: Repository, https://github.com/jekhator/pii-aware-mixin.git
Project-URL: Issues, https://github.com/jekhator/pii-aware-mixin/issues
Author: James Ekhator
License: Apache-2.0
License-File: LICENSE
Keywords: dataclass,masking,repr,sensitive-data,serialization
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown

# pii-aware-mixin

**Decorator-based PHI/PII masking for Python frozen dataclasses.**

Accidentally logging sensitive data—API tokens, passwords, session IDs, PII, healthcare data (PHI)—is a common source of security incidents and compliance violations. `pii-aware-mixin` solves this by providing a lightweight `@phi_aware` decorator and module-level helpers that automatically mask sensitive fields in logs, reprs, and serialization.

## Why?

When you log a dataclass instance or its repr, sensitive fields leak unless you explicitly redact them everywhere:

```python
@dataclass(frozen=True)
class APICredentials:
    user_id: int
    api_token: str

creds = APICredentials(user_id=1, api_token="sk-abc123xyz")
logger.info("Creds: %s", creds)  # logs: "Creds: APICredentials(user_id=1, api_token='sk-abc123xyz')"
                                  # OOPS! Token is exposed.
```

This library makes it one-line per field: mark sensitive fields with metadata, and let the decorator handle the rest.

```python
from dataclasses import dataclass, field
from pii_aware_mixin import phi_aware, mask_for_logging

@phi_aware
@dataclass(frozen=True, slots=True)
class APICredentials:
    user_id: int
    api_token: str = field(metadata={"phi": True})

creds = APICredentials(user_id=1, api_token="sk-abc123xyz")

# Safe for logging:
logger.info("Creds: %s", mask_for_logging(creds))
# → {"user_id": 1, "api_token": "<phi:redacted>"}

# Safe for reprs / tracebacks:
logger.error("Error: %s", repr(creds))
# → "APICredentials(user_id=1, api_token=<phi:redacted>)"

# Full data when needed (serialization, storage):
api_payload = to_dict(creds)
# → {"user_id": 1, "api_token": "sk-abc123xyz"}
```

## Installation

```bash
pip install pii-aware-mixin
```

Requires Python 3.10+.

## Quick Start

### 1. Import the decorator and helpers

```python
from dataclasses import dataclass, field
from pii_aware_mixin import phi_aware, mask_for_logging, to_dict
```

### 2. Decorate and mark sensitive fields

Use the `@phi_aware` decorator on a frozen dataclass with `field(metadata={"phi": True})` on sensitive fields:

```python
@phi_aware
@dataclass(frozen=True, slots=True)
class User:
    id: int
    api_token: str = field(metadata={"phi": True})
    email: str = field(metadata={"phi": True})
    name: str
```

**Key difference from v0.1.0:** No `repr=False` required. The decorator works on plain `@dataclass(frozen=True, slots=True)`.

### 3. Use in your code

```python
user = User(
    id=1,
    api_token="sk-123456",
    email="alice@example.com",
    name="Alice"
)

# Masked for logging:
print(mask_for_logging(user))
# → {"id": 1, "api_token": "<phi:redacted>", "email": "<phi:redacted>", "name": "Alice"}

# Masked for repr (safe in tracebacks, error messages):
print(repr(user))
# → User(id=1, api_token=<phi:redacted>, email=<phi:redacted>, name='Alice')

# Unredacted for storage/transmission:
print(to_dict(user))
# → {"id": 1, "api_token": "sk-123456", "email": "alice@example.com", "name": "Alice"}
```

## API Reference

### @phi_aware decorator

Adds a PHI-masking `__repr__()` to a frozen dataclass. Fields marked with `metadata={"phi": True}` are redacted in repr output.

**Usage:**
```python
@phi_aware
@dataclass(frozen=True, slots=True)
class Patient:
    name: str
    ssn: str = field(metadata={"phi": True})

repr(Patient(name="Alice", ssn="123"))
# → "Patient(name='Alice', ssn=<phi:redacted>)"
```

### mask_for_logging(instance) → dict

Returns a dict representation with PHI fields masked as `<phi:redacted>`.

**Use case:** Structured logging (JSON, CloudWatch, logging frameworks)

```python
@phi_aware
@dataclass(frozen=True, slots=True)
class Credentials:
    username: str
    password: str = field(metadata={"phi": True})

creds = Credentials(username="alice", password="secret")
print(mask_for_logging(creds))
# → {"username": "alice", "password": "<phi:redacted>"}
```

### to_dict(instance) → dict

Returns a full dict representation with all values unmasked. Alias for `dataclasses.asdict()`.

**Use case:** API serialization, storage, transmission. Returns **unredacted** data.

```python
@phi_aware
@dataclass(frozen=True, slots=True)
class APIKey:
    id: int
    key: str = field(metadata={"phi": True})

key = APIKey(id=1, key="sk-abc123")
print(to_dict(key))
# → {"id": 1, "key": "sk-abc123"}
```

## Field Metadata

Mark a field sensitive by adding `metadata={"phi": True}` to `field()`:

```python
from dataclasses import dataclass, field

@phi_aware
@dataclass(frozen=True, slots=True)
class Credentials:
    username: str = field(metadata={"phi": True})
    password: str = field(metadata={"phi": True})
    created_at: str  # not PHI — no metadata needed
```

Any field **without** `metadata` or with `metadata={"phi": False}` is treated as non-sensitive and passes through unmasked.

## Logging Integration

Pair with standard library `logging` for clean, safe logs:

```python
import logging
from dataclasses import dataclass, field
from pii_aware_mixin import phi_aware, mask_for_logging

logger = logging.getLogger(__name__)

@phi_aware
@dataclass(frozen=True, slots=True)
class LoginAttempt:
    username: str
    password: str = field(metadata={"phi": True})
    ip_address: str

def handle_login(username, password, ip):
    attempt = LoginAttempt(username=username, password=password, ip_address=ip)
    logger.info("Login attempt: %s", mask_for_logging(attempt))
    # Logs: {"username": "alice", "password": "<phi:redacted>", "ip_address": "192.168.1.1"}
```

## Mask Strategies

Currently, `pii-aware-mixin` uses a simple full-mask strategy: all sensitive fields become `<phi:redacted>`.

This is by design — it's safe, readable, and appropriate for most use cases. Custom partial-masking (e.g., "last 4 digits") can be layered on top if needed:

```python
# Example: Custom partial masking before logging
masked = mask_for_logging(creds)
if creds.credit_card:
    masked["credit_card"] = f"****-****-****-{creds.credit_card[-4:]}"
logger.info("Payment: %s", masked)
```

## Migration from v0.1.0

v0.2.0 is a **breaking change**. The mixin-inheritance API has been replaced with a cleaner decorator API.

### v0.1.0 (deprecated)

```python
from pii_aware_mixin import PiiAwareMixin, ReprMixin, ToDictMixin

@dataclass(frozen=True, slots=True, repr=False)
class User(PiiAwareMixin, ReprMixin, ToDictMixin):
    id: int
    api_token: str = field(metadata={"sensitive": True})

user.mask_for_logging()    # instance method
repr(user)                 # mixin-provided __repr__
user.to_dict()             # instance method
```

### v0.2.0 (current)

```python
from pii_aware_mixin import phi_aware, mask_for_logging, to_dict

@phi_aware
@dataclass(frozen=True, slots=True)
class User:
    id: int
    api_token: str = field(metadata={"phi": True})

mask_for_logging(user)  # module-level helper
repr(user)              # decorator-provided __repr__
to_dict(user)           # module-level helper
```

**Key benefits:**
- No mixin inheritance (compatible with "no mixin DTOs" canonical pattern)
- No `repr=False` required
- Simpler, more explicit API
- Healthcare-aligned metadata key: `"phi"` instead of `"sensitive"`

## Design Principles

- **Decorator-based:** Simple, non-intrusive. Works on plain frozen dataclasses.
- **Metadata-driven:** No boilerplate. Mark fields, the decorator handles the rest.
- **Type-safe:** Works with frozen dataclasses, slots, type hints.
- **Zero-cost:** Simple, no introspection overhead.
- **Canonical:** Compatible with "no mixin inheritance on data DTOs" pattern.

## License

Apache 2.0 — see LICENSE file.

## Contributing

This library is extracted from real-world healthcare compliance work and maintained by [James Ekhator](https://github.com/jekhator). Contributions welcome via pull requests.

## See Also

- [dataclasses](https://docs.python.org/3/library/dataclasses.html) — Python standard library
- [frozen dataclasses](https://docs.python.org/3/library/dataclasses.html#frozen-instances) — immutable, hashable
- [slots](https://docs.python.org/3/library/dataclasses.html#slots) — memory-efficient (Python 3.10+)
