Metadata-Version: 2.4
Name: pydantic_encryption
Version: 0.9.6
Summary: Encryption, hashing, and blind indexing for Pydantic
License-File: LICENSE
Author: Julien Kmec
Author-email: me@julien.dev
Requires-Python: >=3.11,<3.15
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Provides-Extra: all
Provides-Extra: aws
Provides-Extra: dev
Provides-Extra: sqlalchemy
Requires-Dist: argon2-cffi (>=23.1.0)
Requires-Dist: aws-encryption-sdk[mpl] (>=4.0.1) ; extra == "all"
Requires-Dist: aws-encryption-sdk[mpl] (>=4.0.1) ; extra == "aws"
Requires-Dist: boto3 (>=1.38.8) ; extra == "all"
Requires-Dist: boto3 (>=1.38.8) ; extra == "aws"
Requires-Dist: cachetools (>=5.3.0) ; extra == "all"
Requires-Dist: cachetools (>=5.3.0) ; extra == "aws"
Requires-Dist: coverage (>=7.8.0) ; extra == "dev"
Requires-Dist: cryptography (>=44.0.0)
Requires-Dist: polyfactory (>=2.0.0) ; extra == "dev"
Requires-Dist: psycopg2-binary (>=2.9.10) ; extra == "dev"
Requires-Dist: pydantic (>=2.10.6)
Requires-Dist: pydantic-settings (>=2.9.1)
Requires-Dist: pydantic-super-model (>=2.0.0)
Requires-Dist: pytest (>=8.3.5) ; extra == "dev"
Requires-Dist: pytest-asyncio (>=0.26.0) ; extra == "dev"
Requires-Dist: pytest-cov (>=6.1.1) ; extra == "dev"
Requires-Dist: pytest-docker (>=3.2.1) ; extra == "dev"
Requires-Dist: pytest-env (>=1.1.5) ; extra == "dev"
Requires-Dist: pytest-sqlalchemy (>=0.3.0) ; extra == "dev"
Requires-Dist: sqlalchemy (>=2.0.40) ; extra == "all"
Requires-Dist: sqlalchemy (>=2.0.40) ; extra == "sqlalchemy"
Requires-Dist: sqlalchemy-utils (>=0.41.2) ; extra == "dev"
Project-URL: Repository, https://github.com/julien777z/pydantic-encryption
Description-Content-Type: text/markdown

# pydantic-encryption

Field-level encryption, hashing, and blind indexing for Pydantic models with SQLAlchemy integration.

## Installation

```bash
pip install pydantic-encryption
```

### Optional Extras

```bash
pip install "pydantic-encryption[sqlalchemy]"  # SQLAlchemy integration
pip install "pydantic-encryption[aws]"         # AWS KMS encryption
pip install "pydantic-encryption[all]"         # All optional dependencies
```

## Quick Start

Mix `DeferredDecryptMixin` into any model with encrypted columns. The first time you read an encrypted attribute on any loaded row, the column is batch-decrypted across every sibling instance in the session — columns you never read stay encrypted and cost nothing:

```python
from sqlalchemy import select
from sqlalchemy.ext.asyncio import async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from pydantic_encryption import DeferredDecryptMixin, SQLAlchemyEncryptedValue


class Base(DeclarativeBase):
    pass


class User(Base, DeferredDecryptMixin):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    email: Mapped[bytes] = mapped_column(SQLAlchemyEncryptedValue())


engine = create_async_engine("sqlite+aiosqlite:///:memory:")
Session = async_sessionmaker(engine, expire_on_commit=False)

async with Session() as session:
    session.add(User(email="john@example.com"))
    await session.commit()

    result = await session.execute(select(User))
    user = result.scalar_one()
    print(user.email)  # "john@example.com" — decrypted on first read
```

## SQLAlchemy Integration

Install with `pip install "pydantic-encryption[sqlalchemy]"`.

```python
from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, Session

from pydantic_encryption import (
    SQLAlchemyEncryptedValue,
    SQLAlchemyHashedValue,
    SQLAlchemyBlindIndexValue,
    BlindIndexMethod,
)


class Base(DeclarativeBase):
    pass


class User(Base):
    __tablename__ = "users"

    id: Mapped[int] = mapped_column(primary_key=True)
    username: Mapped[str]
    email: Mapped[bytes] = mapped_column(SQLAlchemyEncryptedValue())
    password: Mapped[bytes] = mapped_column(SQLAlchemyHashedValue())
    blind_index_email: Mapped[bytes] = mapped_column(
        SQLAlchemyBlindIndexValue(BlindIndexMethod.HMAC_SHA256)
    )


engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)

with Session(engine) as session:
    user = User(
        username="john",
        email="john@example.com",
        password="secret123",
        blind_index_email="john@example.com",
    )
    session.add(user)
    session.commit()

    # Query by blind index — automatically hashed
    found = session.query(User).filter(
        User.blind_index_email == "john@example.com"
    ).first()
    print(found.email)  # decrypted
```

### Supported Types

`SQLAlchemyEncryptedValue` preserves the Python type of your data:

`str`, `bytes`, `bool`, `int`, `float`, `Decimal`, `UUID`, `date`, `datetime`, `time`, `timedelta`

### Array Support (PostgreSQL)

```python
from pydantic_encryption import SQLAlchemyPGEncryptedArray

tags: Mapped[list[str] | None] = mapped_column(SQLAlchemyPGEncryptedArray(), nullable=True)
```

Each element is individually encrypted. Requires PostgreSQL.

### Async Decryption

`TypeDecorator` is sync by contract, so slow backends (AWS KMS) can block the event loop. Two paths:

- **Default.** Under `AsyncSession`, decryption uses SQLAlchemy's greenlet bridge so each call yields the event loop. Argon2 hashing and blind-indexing use the same bridge.
- **On-access batch decrypt.** `DeferredDecryptMixin` defers each encrypted column until the first read, then batch-decrypts that column across every sibling instance loaded into the same session via a single `asyncio.gather`. Columns the caller never reads stay encrypted and cost nothing.

Mix the helper into any model with encrypted columns and read as usual:

```python
from pydantic_encryption import DeferredDecryptMixin, SQLAlchemyEncryptedValue


class User(Base, DeferredDecryptMixin):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    email: Mapped[bytes] = mapped_column(SQLAlchemyEncryptedValue())


Session = async_sessionmaker(engine, expire_on_commit=False)

async with Session() as session:
    result = await session.execute(select(User))
    users = result.scalars().all()

    # First read of `email` batch-decrypts it across every user in the session.
    for user in users:
        print(user.email)
```

`decrypt_pending_fields(session)` is an optional escape hatch when you need to pre-warm every encrypted column on every loaded row before leaving the session context (e.g. serializing outside a greenlet spawn):

```python
from pydantic_encryption import decrypt_pending_fields

async with Session() as session:
    users = (await session.execute(select(User))).scalars().all()

    # Decrypt every encrypted column on every row loaded so far.
    await decrypt_pending_fields(session)

    payload = [{"id": u.id, "email": u.email} for u in users]
```

`finalize_session(session)` combines the above with a `commit()`, returning the pooled connection before response construction. Handy on read endpoints that would otherwise hold a DB connection through descriptor-driven KMS decryption:

```python
from pydantic_encryption import finalize_session

async with Session() as session:
    users = (await session.execute(select(User))).scalars().all()
    await finalize_session(session)  # decrypt pending + commit — connection released
    return [{"id": u.id, "email": u.email} for u in users]
```

**Manual helpers** for rows loaded outside a session or flat ciphertext lists:

```python
from pydantic_encryption import decrypt_rows, decrypt_values


async with AsyncSession(engine) as session:
    users = (await session.execute(select(User))).scalars().all()
    ciphertexts = [u.email for u in users]

    await users[0].decrypt()                              # one mixin instance
    await User.decrypt_many(users)                        # batch of one class
    await decrypt_rows(users, User.email, concurrency=8)  # InstrumentedAttribute or column names
    await decrypt_values(ciphertexts, concurrency=8)      # flat ciphertexts; preserves None positions
```

### Safety: Catching Accidental Ciphertext Access

Reads go through the on-access descriptor. When the underlying cell is still an `EncryptedValue`, the descriptor prefers an async batch decrypt over the session's pending siblings (via SQLAlchemy's greenlet bridge), and transparently falls back to a synchronous decrypt either when the read happens outside a greenlet or when the instance is detached from any session.

An `EncryptedValue` only reaches user code if something bypasses the descriptor entirely (raw `state.dict[col]`, a logged row). Coercing it via `str(value)` / `f"{value}"` / `"%s" % value` raises `EncryptedValueAccessError`. `repr(value)` is a safe `<EncryptedValue: N bytes>` marker, and `bytes(value)` returns the raw ciphertext. Use `is_encrypted(value)` to guard at a boundary.

## Manual Encryption or Hashing

Fields annotated with `Encrypted` are encrypted and fields annotated with `Hashed` are hashed during model initialization:

```python
from typing import Annotated
from pydantic_encryption import BaseModel, Encrypted, Hashed

class User(BaseModel):
    name: str
    address: Annotated[bytes, Encrypted]
    password: Annotated[str, Hashed]

user = User(name="John Doe", address="123 Main St", password="secret123")

print(user.name)      # "John Doe"
print(user.address)   # encrypted bytes
print(user.password)  # argon2 hash bytes
```

### Decrypting

Call `decrypt_data()` to decrypt all `Encrypted` fields in-place. It returns `self`, so it can be chained:

```python
user = User(name="John", address="123 Main St", password="secret")
user.decrypt_data()
print(user.address)  # "123 Main St"
```

### Async Support

Use `async_init()` to construct models with async encryption, hashing, and blind indexing, and `async_decrypt_data()` for async decryption:

```python
user = await User.async_init(name="John", address="123 Main St", password="secret")
await user.async_decrypt_data()
```

All phases (encrypt, hash, blind-index) run concurrently via `asyncio.gather`, and nested `BaseModel` instances — including those inside `list`, `tuple`, `dict`, and `set` containers — are processed recursively.

## Encryption Methods

Set the encryption method via environment variable:

```bash
ENCRYPTION_METHOD=fernet   # Fernet symmetric encryption (requires ENCRYPTION_KEY)
ENCRYPTION_METHOD=aws      # AWS KMS (requires AWS_KMS_KEY_ARN, AWS_KMS_REGION, etc.)
```

There is no default — you must explicitly set `ENCRYPTION_METHOD` if using `Encrypted` fields.

### Fernet Setup

```bash
# Generate a key
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

# Set environment variables
ENCRYPTION_METHOD=fernet
ENCRYPTION_KEY=your_generated_key
```

### AWS KMS Setup

```bash
ENCRYPTION_METHOD=aws
AWS_KMS_KEY_ARN=arn:aws:kms:us-east-1:123456789:key/your-key-id
AWS_KMS_REGION=us-east-1
AWS_KMS_ACCESS_KEY_ID=your_access_key
AWS_KMS_SECRET_ACCESS_KEY=your_secret_key
```

As an alternative to `AWS_KMS_KEY_ARN`, separate encrypt/decrypt keys are supported for key rotation or read-only scenarios:

```bash
AWS_KMS_ENCRYPT_KEY_ARN=arn:aws:kms:...encrypt-key
AWS_KMS_DECRYPT_KEY_ARN=arn:aws:kms:...decrypt-key
```

Use one mode or the other — combining `AWS_KMS_KEY_ARN` with either split variant raises a validation error. A decrypt-only key alone is allowed (read-only workloads).

#### Plaintext Cache (Opt-In)

For read-heavy workloads that repeatedly decrypt the same ciphertexts, AWS KMS round-trips dominate. An in-process LRU of ciphertext → plaintext is available as opt-in:

```bash
AWS_KMS_PLAINTEXT_CACHE_ENABLED=true      # default: false
AWS_KMS_PLAINTEXT_CACHE_CAPACITY=2048     # default: 2048 entries
```

Disabled by default because cache entries hold decrypted sensitive data in a process-wide `cachetools.LRUCache` for the lifetime of the process. Enable it when the perf win outweighs keeping plaintext resident in memory.

### Model-Level Config

Override encryption settings per model instead of relying on environment variables:

```python
from pydantic_encryption import BaseModel, Encrypted, EncryptionMethod
from typing import Annotated

class SpecialUser(BaseModel, encryption_method=EncryptionMethod.FERNET, encryption_key="my-key"):
    email: Annotated[bytes, Encrypted]
```

Supported kwargs: `encryption_method`, `encryption_key`, `blind_index_key`. Falls back to env vars if not set.

## Blind Indexes

Blind indexes enable equality searches on encrypted data by storing a deterministic keyed hash alongside the ciphertext.

**Configuration:** Set `BLIND_INDEX_SECRET_KEY` via environment variable.

### Pydantic Models

```python
from typing import Annotated
from pydantic_encryption import BaseModel, BlindIndex, BlindIndexMethod

class User(BaseModel):
    email_index: Annotated[bytes, BlindIndex(BlindIndexMethod.HMAC_SHA256)]
```

### Normalization

Normalize values before hashing to ensure consistent lookups:

```python
email_index: Annotated[bytes, BlindIndex(
    BlindIndexMethod.HMAC_SHA256,
    normalize_to_lowercase=True,
    strip_whitespace=True,
)]
```

Available options:

| Option | Effect |
|--------|--------|
| `strip_whitespace` | Strip leading/trailing whitespace, collapse internal whitespace |
| `strip_non_characters` | Remove all non-letter characters (keep only a-zA-Z) |
| `strip_non_digits` | Remove all non-digit characters (keep only 0-9) |
| `normalize_to_lowercase` | Convert to lowercase |
| `normalize_to_uppercase` | Convert to uppercase |

### Methods

| Method | Description |
|--------|-------------|
| `BlindIndexMethod.HMAC_SHA256` | Fast HMAC-SHA256 keyed hash. Standard choice. |
| `BlindIndexMethod.ARGON2` | Memory-hard Argon2 hash with deterministic salt. Better brute-force resistance. |

## Custom Encryption or Hashing

Subclass `BaseModel` and override any of `encrypt_data`, `hash_data`, `blind_index_data` (or their async variants) to plug in your own logic. The post-init hook runs automatically:

```python
from pydantic_encryption import BaseModel

class MyModel(BaseModel):
    def encrypt_data(self) -> None:
        # your encryption logic (mutate self in-place)
        ...
```

To implement a new backend instead of replacing the per-model path, subclass one of the adapter ABCs (`EncryptionAdapter`, `HashingAdapter`, `BlindIndexAdapter`) and register it via `register_encryption_backend` / `register_blind_index_backend`. Async variants are inherited by default — override `async_encrypt` / `async_decrypt` only for natively-async backends.

## Run Tests

```bash
pip install -e ".[dev]"
pytest -v
```

