Metadata-Version: 2.4
Name: sensitivity-mixin
Version: 0.3.1
Summary: Sensitive-data classification and masking for Python frozen dataclasses
Project-URL: Homepage, https://github.com/jekhator/sensitivity-mixin
Project-URL: Repository, https://github.com/jekhator/sensitivity-mixin.git
Project-URL: Issues, https://github.com/jekhator/sensitivity-mixin/issues
Project-URL: Changelog, https://github.com/jekhator/sensitivity-mixin/blob/main/CHANGELOG.md
Author: James Ekhator
License: Apache-2.0
License-File: LICENSE
Keywords: dataclass,masking,repr,sensitive-data,serialization
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown

# sensitivity-mixin

[![PyPI Version](https://img.shields.io/pypi/v/sensitivity-mixin.svg)](https://pypi.org/project/sensitivity-mixin/)
[![CI](https://github.com/jekhator/sensitivity-mixin/actions/workflows/ci.yml/badge.svg)](https://github.com/jekhator/sensitivity-mixin/actions/workflows/ci.yml)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python Versions](https://img.shields.io/pypi/pyversions/sensitivity-mixin.svg)](https://pypi.org/project/sensitivity-mixin/)

**Decorator-based sensitivity classification and masking for Python frozen dataclasses.**

Accidentally logging sensitive data—API tokens, passwords, session IDs, PII, healthcare data (PHI), credit card numbers, secrets—is a common source of security incidents and compliance violations. `sensitivity-mixin` solves this by providing a lightweight `@sensitive` decorator and taxonomy-based classification that automatically masks sensitive fields in logs and reprs.

## Why?

When you log a dataclass instance or its repr, sensitive fields leak unless you explicitly redact them everywhere:

```python
@dataclass(frozen=True)
class APICredentials:
    user_id: int
    api_token: str

creds = APICredentials(user_id=1, api_token="sk-abc123xyz")
logger.info("Creds: %s", creds)  # logs: "Creds: APICredentials(user_id=1, api_token='sk-abc123xyz')"
                                  # OOPS! Token is exposed.
```

This library makes it one-line per field: mark sensitive fields with a decorator, and let the classifier introspect and mask them automatically.

```python
from dataclasses import dataclass, field
from sensitivity_mixin import sensitive, classify

@sensitive
@dataclass(frozen=True, slots=True)
class APICredentials:
    user_id: int
    api_token: str = field(metadata={"sensitivity": "secret"})

creds = APICredentials(user_id=1, api_token="sk-abc123xyz")

# Introspect sensitivity:
profile = classify(creds)
# → SensitivityProfile(classes=(('api_token', Sensitivity.SECRET),))

# Safe for reprs / tracebacks:
logger.error("Error: %s", repr(creds))
# → "APICredentials(user_id=1, api_token=***)"
```

## Installation

```bash
pip install sensitivity-mixin
```

or with uv:

```bash
uv add sensitivity-mixin
```

Requires Python 3.11+.

## Quick Start

### 1. Import the decorator and classifier

```python
from dataclasses import dataclass, field
from sensitivity_mixin import sensitive, classify, Sensitivity
```

### 2. Decorate and mark sensitive fields

Use the `@sensitive` decorator on a frozen dataclass and tag fields with a sensitivity taxonomy:

```python
@sensitive
@dataclass(frozen=True, slots=True)
class User:
    id: int
    api_token: str = field(metadata={"sensitivity": "secret"})
    email: str = field(metadata={"sensitivity": "pii"})
    ssn: str = field(metadata={"sensitivity": "phi"})
    name: str
```

Supported sensitivity tags: `"phi"` (healthcare data), `"pii"` (personal info), `"pci"` (payment card data), `"secret"` (credentials/tokens), or omit for non-sensitive. (Alternatively, use the `Sensitivity` enum: `Sensitivity.PHI`, `Sensitivity.PII`, `Sensitivity.PCI`, `Sensitivity.SECRET`.)

### 3. Use in your code

```python
user = User(
    id=1,
    api_token="sk-123456",
    email="alice@example.com",
    ssn="123-45-6789",
    name="Alice"
)

# Introspect sensitivity:
profile = classify(user)
print(profile.has(Sensitivity.SECRET))  # → True
print(profile.fields_of(Sensitivity.PII))  # → ('email',)

# Masked for repr (safe in tracebacks, error messages):
print(repr(user))
# → User(id=1, api_token=***, email=***, ssn=***, name='Alice')
```

### 4. Use policy-driven masking (optional)

Wire per-class policies to customize masking placeholders:

```python
from sensitivity_mixin import SensitiveDecorator
from sensitivity_mixin.decorators.classes.secret_aware import SecretPolicyAware
from sensitivity_mixin.decorators.classes.compliance import Compliance

secret_policy = SecretPolicyAware(
    compliance=Compliance.NONE,
    detection_hints=("api_token", "secret", "token", "password"),
    placeholder="[REDACTED]"
)

decorator = SensitiveDecorator(policies=((Sensitivity.SECRET, secret_policy),))

@decorator
@dataclass(frozen=True, slots=True)
class ApiClient:
    client_id: str
    api_token: str = field(metadata={"sensitivity": "secret"})

client = ApiClient(client_id="c1", api_token="sk-secret")
print(repr(client))
# → ApiClient(client_id='c1', api_token=[REDACTED])
```

## API Reference

### @sensitive decorator

Adds a sensitivity-aware `__repr__()` to a frozen dataclass. Fields marked with a sensitivity tag in metadata are redacted in repr output using a default placeholder (`***`).

**Usage:**
```python
@sensitive
@dataclass(frozen=True, slots=True)
class Patient:
    name: str
    ssn: str = field(metadata={"sensitivity": "phi"})

repr(Patient(name="Alice", ssn="123"))
# → "Patient(name='Alice', ssn=***)"
```

**With policies:**
```python
from sensitivity_mixin import SensitiveDecorator
from sensitivity_mixin.decorators.classes.phi_aware import PhiPolicyAware
from sensitivity_mixin.decorators.classes.compliance import Compliance

phi_policy = PhiPolicyAware(
    compliance=Compliance.HIPAA,
    detection_hints=("ssn", "name"),
    placeholder="[REDACTED]"
)

decorator = SensitiveDecorator(policies=((Sensitivity.PHI, phi_policy),))

@decorator
@dataclass(frozen=True, slots=True)
class Patient:
    name: str
    ssn: str = field(metadata={"sensitivity": "phi"})

repr(Patient(name="Alice", ssn="123"))
# → "Patient(name=[REDACTED], ssn=[REDACTED])"
```

### classify(instance) → SensitivityProfile

Introspects a dataclass and returns a `SensitivityProfile` documenting all sensitivity-tagged fields.

**Use case:** Compliance auditing, field-level sensitivity introspection

```python
@sensitive
@dataclass(frozen=True, slots=True)
class Credentials:
    username: str
    password: str = field(metadata={"sensitivity": "secret"})
    api_key: str = field(metadata={"sensitivity": "secret"})

creds = Credentials(username="alice", password="secret", api_key="sk-123")
profile = classify(creds)
# → SensitivityProfile(classes=(('password', Sensitivity.SECRET), ('api_key', Sensitivity.SECRET)))

# Query the profile:
print(profile.has(Sensitivity.SECRET))  # → True
print(profile.fields_of(Sensitivity.SECRET))  # → ('password', 'api_key')
print(profile.sensitivity_of('username'))  # → None (unclassified)
```

**SensitivityProfile** provides:
- `classes: tuple[tuple[str, Sensitivity], ...]` — field name → sensitivity mapping
- `has(kind: Sensitivity) → bool` — check for a sensitivity class
- `fields_of(kind: Sensitivity) → tuple[str, ...]` — get field names of a class
- `sensitivity_of(name: str) → Sensitivity | None` — get the class of a field
- `is_empty → bool` — whether any fields are tagged

## Field Metadata

Mark a field sensitive by adding `metadata={"sensitivity": "<TAG>"}` to `field()`:

```python
from dataclasses import dataclass, field
from sensitivity_mixin import sensitive

@sensitive
@dataclass(frozen=True, slots=True)
class Credentials:
    username: str
    password: str = field(metadata={"sensitivity": "secret"})
    email: str = field(metadata={"sensitivity": "pii"})
    created_at: str  # not sensitive — no metadata needed
```

Supported tags:
- `"phi"` — Protected Health Information (healthcare/medical records)
- `"pii"` — Personally Identifiable Information (names, emails, SSNs)
- `"pci"` — Payment Card Industry data (credit card numbers)
- `"secret"` — API tokens, passwords, secrets
- Omitted — non-sensitive (passes through unmasked)

Any field **without** `metadata` or with `metadata={"sensitivity": None}` is treated as non-sensitive and passes through unmasked.

## Security Boundary: What This Does and Does NOT Protect

`@sensitive` is a **repr-layer masking tool**, not a complete confidentiality boundary. It masks sensitive fields **when you log or print the object itself**, but does **not** protect against direct field access or serialization bypass.

### Protected (Repr Layer Only)
- ✓ `repr(obj)` — sensitive fields masked
- ✓ `str(obj)` / `print(obj)` — uses masked repr
- ✓ Logging the object: `logger.info("Object: %s", obj)` — masked
- ✓ F-string with object: `f"Object: {obj}"` — masked

### NOT Protected (Bypass Methods)
- ✗ Direct field access: `obj.api_token` returns the **full unmasked value**
- ✗ `dataclasses.asdict(obj)` returns a dict with **full unmasked values**
- ✗ `json.dumps(asdict(obj))` contains **full unmasked values in JSON**
- ✗ Logging a field directly: `logger.info(f"Token: {obj.api_token}")` exposes the **full value**
- ✗ Attribute introspection: `getattr(obj, 'api_token')` returns **full unmasked value**
- ✗ Untagged fields are **not masked** — classification is explicit/opt-in

### Example: Correct and Incorrect Usage

```python
from dataclasses import dataclass, field
from sensitivity_mixin import sensitive
import logging

logger = logging.getLogger(__name__)

@sensitive
@dataclass(frozen=True, slots=True)
class APIKey:
    name: str
    secret: str = field(metadata={"sensitivity": "secret"})

key = APIKey(name="prod-key", secret="sk-abc123xyz")

# ✓ SAFE: logging the object uses masked repr
logger.info("API Key: %s", key)
# Output: "API Key: APIKey(name='prod-key', secret=<sensitive:redacted>)"

# ✗ UNSAFE: logging a field directly bypasses the decorator
logger.warning("Secret: %s", key.secret)
# Output: "Secret: sk-abc123xyz"  ← FULL VALUE EXPOSED!

# ✗ UNSAFE: serializing with asdict() bypasses the decorator
from dataclasses import asdict
logger.debug("Data: %s", asdict(key))
# Output: "Data: {'name': 'prod-key', 'secret': 'sk-abc123xyz'}"  ← FULL VALUES EXPOSED!
```

**Use case**: `@sensitive` is ideal for DTOs at the logging boundary. Keep sensitive fields wrapped in the dataclass; avoid field-level logging. For applications requiring stronger confidentiality guarantees, apply field-level masking at the serialization boundary or use dedicated encryption libraries.

## Logging Integration

Pair with standard library `logging` for clean, safe logs:

```python
import logging
from dataclasses import dataclass, field
from sensitivity_mixin import sensitive

logger = logging.getLogger(__name__)

@sensitive
@dataclass(frozen=True, slots=True)
class LoginAttempt:
    username: str
    password: str = field(metadata={"sensitivity": "secret"})
    ip_address: str

def handle_login(username, password, ip):
    attempt = LoginAttempt(username=username, password=password, ip_address=ip)
    logger.info("Login attempt: %s", repr(attempt))
    # Logs: "LoginAttempt(username='alice', password=<sensitive:redacted>, ip_address='192.168.1.1')"
```

## Mask Strategies

By default, `@sensitive` masks all sensitive fields with `***` (DEFAULT_PLACEHOLDER).

For customized masking, instantiate policy value objects and wire them into `SensitiveDecorator`:

```python
from sensitivity_mixin import Sensitivity, SensitiveDecorator
from sensitivity_mixin.decorators.classes.secret_aware import SecretPolicyAware
from sensitivity_mixin.decorators.classes.compliance import Compliance

secret_policy = SecretPolicyAware(
    compliance=Compliance.NONE,
    detection_hints=("api_key", "secret", "token"),
    placeholder="***REDACTED***"
)

decorator = SensitiveDecorator(policies=((Sensitivity.SECRET, secret_policy),))

@decorator
@dataclass(frozen=True, slots=True)
class Config:
    api_key: str = field(metadata={"sensitivity": "secret"})

repr(Config(api_key="sk-123"))
# → "Config(api_key=***REDACTED***)"
```

See `docs/apps/decorators/policies.md` for policy customization details.

## Migration from Earlier Versions

v0.3.0 introduces a **taxonomy-driven architecture** with broadened sensitivity classification.

### Earlier versions (v0.1, v0.2)

```python
from pii_aware_mixin import phi_aware

@phi_aware
@dataclass(frozen=True, slots=True)
class User:
    id: int
    api_token: str = field(metadata={"phi": True})
```

### v0.3.0 (current)

```python
from sensitivity_mixin import sensitive, classify

@sensitive
@dataclass(frozen=True, slots=True)
class User:
    id: int
    api_token: str = field(metadata={"sensitivity": "secret"})
    email: str = field(metadata={"sensitivity": "pii"})

profile = classify(user)  # introspect sensitivity
```

**Key improvements:**
- Broadened taxonomy: `PHI`, `PII`, `PCI`, `SECRET` (not just `phi`)
- Classification introspection: `classify()` returns a `SensitivityProfile`
- Per-class policy value objects for specialized masking customization
- Foundation for compliance-aware field governance

## Design Principles

- **Decorator-based:** Simple, non-intrusive. Works on plain frozen dataclasses.
- **Taxonomy-driven:** Classify sensitivity at the field level: PHI, PII, PCI, or SECRET.
- **Introspectable:** `classify()` exposes field-level sensitivity for compliance audits.
- **Type-safe:** Works with frozen dataclasses, slots, type hints.
- **Zero-cost:** Minimal introspection overhead at decoration time.
- **Canonical:** Compatible with "no mixin inheritance on data DTOs" pattern.

## License

Apache 2.0 — see LICENSE file.

## Contributing

This library is maintained by [James Ekhator](https://github.com/jekhator). Contributions welcome via pull requests.

## See Also

- [dataclasses](https://docs.python.org/3/library/dataclasses.html) — Python standard library
- [frozen dataclasses](https://docs.python.org/3/library/dataclasses.html#frozen-instances) — immutable, hashable
- [slots](https://docs.python.org/3/library/dataclasses.html#slots) — memory-efficient (Python 3.10+)
