Metadata-Version: 2.4
Name: cassandra-rekey
Version: 0.1.0
Summary: Zero-downtime, resumable, parallel re-encryption tool for Apache Cassandra
Project-URL: Homepage, https://github.com/ankit-dub/cassandra-rekey
Project-URL: Repository, https://github.com/ankit-dub/cassandra-rekey
Project-URL: Issues, https://github.com/ankit-dub/cassandra-rekey/issues
Author-email: Ankit Dubey <ankit-dub@users.noreply.github.com>
License: Apache-2.0
License-File: LICENSE
Keywords: cassandra,encryption,kms,migration,rekey
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Security :: Cryptography
Requires-Python: >=3.12
Requires-Dist: cassandra-driver>=3.29.0
Requires-Dist: cryptography>=42.0
Requires-Dist: pydantic>=2.7
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.7
Requires-Dist: typer>=0.12.0
Provides-Extra: dev
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=5.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: integration
Requires-Dist: testcontainers[cassandra]>=4.0; extra == 'integration'
Provides-Extra: kms
Requires-Dist: boto3>=1.34; extra == 'kms'
Description-Content-Type: text/markdown

# cassandra-rekey

Zero-downtime, resumable, parallel re-encryption tool for Apache Cassandra.

[![CI](https://github.com/ankit-dub/cassandra-rekey/actions/workflows/ci.yml/badge.svg)](https://github.com/ankit-dub/cassandra-rekey/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/cassandra-rekey.svg)](https://pypi.org/project/cassandra-rekey/)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.12+-blue)](pyproject.toml)

## Why

Cassandra encryption keys must be rotated on a schedule (compliance, KMS lifecycle, leaked-key incidents). Re-encrypting petabytes of column-level encrypted blobs is hard:

- Cannot afford downtime
- Must resume after pod restarts, network blips, or operator pauses
- Must not double-encrypt or skip rows under retries
- Must throttle to protect read-path SLOs
- Must support **multiple tables** under a single rotation job

`cassandra-rekey` does this generically. Plug in any encrypt/decrypt provider (Fernet, AWS KMS, GCP KMS, custom IDPS-style services). Run from a CLI. Resume any time.

## How it works

```mermaid
flowchart LR
    A[CLI plan] --> B[Token-Range Planner]
    B --> C[(rekey_jobs / rekey_chunks<br/>meta tables)]
    A2[CLI run] --> D[Async Executor]
    D --> C
    D --> E[Worker pool<br/>asyncio + semaphore]
    E --> F[Crypto Provider<br/>decrypt → re-encrypt]
    F --> G[(Target tables)]
    E --> H[Backpressure monitor<br/>read p99]
    H --> E
```

1. **Plan**: split the Murmur3 token ring into N chunks per table, persist to `rekey_chunks`.
2. **Run**: workers claim PENDING chunks, paginate with `token(pk) >= ? AND token(pk) < ?`, decrypt with the *old* key, re-encrypt with the *new* key, write back with a key-version column. Idempotent on retry.
3. **State**: every chunk transition (`PENDING → RUNNING → DONE`) hits the meta table. Pause = stop the executor; resume = re-run, executor only picks up non-DONE chunks.

## Install

```bash
pip install cassandra-rekey
# or with KMS support
pip install 'cassandra-rekey[kms]'
```

## Quick start

```bash
cassandra-rekey init-state --config config.yaml
cassandra-rekey doctor    --config config.yaml          # read-only pre-flight
cassandra-rekey plan      --config config.yaml
cassandra-rekey run       --config config.yaml --job-id <uuid> --workers 16
cassandra-rekey status    --config config.yaml --job-id <uuid>
cassandra-rekey pause     --config config.yaml --job-id <uuid>
cassandra-rekey resume    --config config.yaml --job-id <uuid>
```

### Config (`config.yaml`)

```yaml
cluster:
  contact_points: [cass-1, cass-2, cass-3]
  port: 9042
  local_dc: us-west-2
  keyspace: app_data

state:
  keyspace: rekey_meta

provider:
  type: fernet
  old_key_env: REKEY_OLD_KEY
  new_key_env: REKEY_NEW_KEY

tables:
  - name: users
    partition_key: [tenant_id, user_id]   # composite PK supported
    clustering_key: [event_time]          # used in UPDATE WHERE
    encrypted_columns: [ssn, email]
    chunks: 256
    preserve_ttl: true                    # read TTL(col), write USING TTL ?
    preserve_writetime: true              # read WRITETIME(col), write USING TIMESTAMP ?
    use_lwt_guard: true                   # IF key_version = old_version
    consistency:
      read: LOCAL_QUORUM
      write: LOCAL_QUORUM
  - name: accounts
    partition_key: [account_id]
    encrypted_columns: [account_number]
    chunks: 1024

execution:
  workers: 16
  read_p99_threshold_ms: 50
  pause_on_backpressure: true
```

## Crypto providers

Built-in:
- `fernet` — AES-128-CBC + HMAC, symmetric key from env var
- `aws_kms` — envelope encryption with KMS data keys *(optional, install with `[kms]`)*

Custom: implement `EncryptProvider`:

```python
from cassandra_rekey.crypto.base import EncryptProvider

class MyProvider(EncryptProvider):
    def decrypt(self, ciphertext: bytes) -> bytes: ...
    def encrypt(self, plaintext: bytes) -> bytes: ...
    @property
    def key_version(self) -> str: ...
```

Register via entry point or pass as `provider.module`.

## Cassandra concepts handled

| Concept | Handling |
|---|---|
| Composite partition key | Full tuple passed to `token(...)` for chunk scans |
| Clustering key | Included in UPDATE WHERE so wide partitions update precise rows |
| TTL | `preserve_ttl: true` reads `TTL(col)` and writes `USING TTL ?` (uses min TTL across encrypted cols of a row) |
| WRITETIME | `preserve_writetime: true` reads `WRITETIME(col)` and writes `USING TIMESTAMP ?` |
| Concurrent app writes | `use_lwt_guard: true` adds `IF key_version = <old>` — races are reported, not corrupted |
| Counters | Rejected at `doctor`/`plan` time — cannot be SET |
| Encrypted partition/clustering columns | Rejected — requires copy-and-swap, out of scope |
| Encrypted columns with secondary indexes | Rejected — index would have stale ciphertext after rekey |
| Tombstones / null rows | Skipped before decrypt attempt |
| Schema drift mid-rekey | `schema_fingerprint` snapshotted at plan, verified at run; aborts on drift |
| Consistency level | Per-table `consistency.read` / `consistency.write` (defaults to `LOCAL_QUORUM`) |
| Multi-DC | Run from one DC; use `LOCAL_QUORUM` reads/writes; do not run repair during the rekey window |

## Out of scope (intentionally)

- Wide-partition cursoring (clustering-key paged scan) — file an issue if you hit OOM on huge partitions
- Map / set / list / UDT element-wise re-encryption
- Per-node permit pool
- Janitor for stuck `RUNNING` chunks (planned)

## Safety guarantees

- **Idempotent** — each row carries a `key_version` column; workers skip rows already at the new version.
- **Resumable** — chunk state is the source of truth, not in-memory progress.
- **Bounded blast radius** — a worker crash only loses the in-flight chunk, which is retried.
- **Backpressure** — read p99 monitor pauses workers when the cluster is hot.
- **Dry run** — `--dry-run` reads + decrypts but does not write, surfaces decode errors safely.

## Architecture

See [docs/architecture.md](docs/architecture.md).

## Roadmap

- [x] Core plan/run/resume
- [x] Multi-table fan-out
- [x] Cassandra-backed state store
- [x] Pause / resume CLI commands
- [x] Backpressure monitor (EWMA over read latency)
- [x] AWS KMS provider (envelope encryption)
- [x] Janitor for stuck `RUNNING` chunks
- [x] testcontainers integration test
- [ ] Prometheus metrics endpoint
- [ ] Web dashboard

## Releasing

See [docs/release.md](docs/release.md). Releases are triggered by pushing a
`vX.Y.Z` tag; the workflow uses PyPI Trusted Publisher OIDC, so no API token
is stored in GitHub.

## License

Apache 2.0. See [LICENSE](LICENSE).
