Metadata-Version: 2.4
Name: dpdpstack-python-sdk
Version: 0.6.0
Summary: DPDP-compliant data erasure for Indian apps: legal-hold-aware deletion, PII anonymization, and tamper-evident Certificates of Erasure. Zero-egress - runs inside your app.
Author: getdpdp.net
License: MIT
Project-URL: Homepage, https://getdpdp.net
Project-URL: Documentation, https://getdpdp.net/docs
Project-URL: Repository, https://github.com/getdpdp/dpdpstack-python-sdk
Project-URL: Issues, https://github.com/getdpdp/dpdpstack-python-sdk/issues
Keywords: dpdp,dpdpa,privacy,data-deletion,erasure,rbi,pmla,compliance,india
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: django
Requires-Dist: Django>=4.2; extra == "django"
Provides-Extra: sqlalchemy
Requires-Dist: SQLAlchemy>=2.0; extra == "sqlalchemy"
Provides-Extra: crypto
Requires-Dist: pyjwt[crypto]>=2.8; extra == "crypto"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Dynamic: license-file

# dpdpstack

**DPDP-compliant data erasure for Indian apps - handled in your code.**

Indian developers keep hitting the same wall: DPDP says *erase the user's data on
withdrawal*, but RBI (KYC, 5 yrs), PMLA, CERT-In (logs, 180 days) and the Companies
Act say *keep it*. So teams hand-delete data across tables, can't prove it, and
enterprise tools "cost more than a month's revenue."

`dpdpstack` is a small, **zero-egress** library that handles the hard part:

- **Legal-hold-aware erasure** - delete now, or *defer* under RBI/PMLA/CERT-In holds (with the basis recorded), then erase when the hold lapses.
- **PII anonymization** - irreversibly null/hash PII while keeping the ledger row (referential integrity), the way teams actually solve this.
- **Certificate of Erasure** - a verifiable, tamper-evident proof you erased (or are lawfully holding) a user's data.
- **Zero-egress** - *you* perform the mutation in your own DB; the library only decides and records. Personal data never leaves your systems.

> Not a cookie banner. Not a consultant. A deletion/retention engine for developers.

**[Documentation](https://getdpdp.net/docs)** · **[Source](https://github.com/getdpdp/dpdpstack-python-sdk)** · **[Hosted platform](https://getdpdp.net)**

## Install
```bash
pip install dpdpstack-python-sdk                # core, no dependencies
pip install "dpdpstack-python-sdk[django]"      # + Django adapter
pip install "dpdpstack-python-sdk[sqlalchemy]"  # + SQLAlchemy adapter (FastAPI/Flask/…)
pip install "dpdpstack-python-sdk[crypto]"      # + signed certs & crypto-shred (PyJWT + cryptography)
```

## Quickstart (framework-agnostic)
```python
from dpdpstack import ErasureEngine, AuditLog, RetentionPolicy, Action, rbi_kyc, issue_certificate

engine = ErasureEngine(AuditLog())

# Normal purpose: hard-delete on withdrawal. Your delete runs in `executor`.
engine.request_erasure(
    subject="user_42",
    policy=RetentionPolicy(purpose="marketing", action=Action.DELETE),
    reason="consent_withdrawn",
    executor=lambda action: my_delete_user(42),
)

# KYC: RBI mandates 5y retention -> erasure is DEFERRED, not refused.
res = engine.request_erasure(subject="user_42", policy=rbi_kyc("kyc"), reason="consent_withdrawn")
print(res.status, res.legal_basis, res.erase_after)   # deferred  RBI KYC...  2031-...

cert = issue_certificate(engine.audit, "user_42", "marketing")  # verifiable proof
```

## Django (zero-egress, runs against your models)
```python
# settings.py
INSTALLED_APPS += ["dpdpstack.contrib.django"]
# python manage.py migrate

from dpdpstack import RetentionPolicy, Action, null, redact, rbi_kyc
from dpdpstack.contrib.django.service import erase_instance, pii

# Declare a model's PII fields once with @pii - no pii_fields= on every call.
@pii(name=null, email=null, phone=redact(keep_last=4))
class User(models.Model):
    ...

# Hard delete + audit
erase_instance(user, policy=RetentionPolicy(purpose="marketing", action=Action.DELETE),
               subject=user.external_ref)

# Anonymize PII, keep the (regulated) row - uses the @pii declaration above
erase_instance(user, policy=RetentionPolicy(purpose="profile", action=Action.ANONYMIZE),
               subject=user.external_ref)

# KYC withdrawal -> deferred under RBI hold, nothing deleted, basis recorded
erase_instance(user, policy=rbi_kyc("kyc"), subject=user.external_ref)
```

## FastAPI / Flask / any SQLAlchemy app (`[sqlalchemy]`)
The same engine + DB-backed audit chain, against a SQLAlchemy `Session`. You map the
audit entry once (you own the `Base`); the `@pii` declaration is shared with Django.
```python
from sqlalchemy.orm import DeclarativeBase
from dpdpstack import RetentionPolicy, Action, null, redact, rbi_kyc
from dpdpstack.contrib.sqlalchemy.models import DpdpAuditEntryMixin
from dpdpstack.contrib.sqlalchemy.service import erase_instance, pii

class Base(DeclarativeBase): ...

class DpdpAuditEntry(Base, DpdpAuditEntryMixin):   # the hash-chained audit store
    __tablename__ = "dpdp_audit_entries"

@pii(name=null, email=null, phone=redact(keep_last=4))
class User(Base):
    __tablename__ = "users"
    ...

# Anonymize PII, keep the (regulated) row; your session, your transaction.
erase_instance(session, user, audit_model=DpdpAuditEntry, subject=user.external_ref,
               policy=RetentionPolicy(purpose="profile", action=Action.ANONYMIZE))

# KYC withdrawal -> deferred under RBI hold, nothing deleted, basis recorded
erase_instance(session, user, audit_model=DpdpAuditEntry, subject=user.external_ref,
               policy=rbi_kyc("kyc"))
session.commit()
```

## Find your PII fields (`scan`)
You declare PII once with `@pii(...)` - but which fields *are* PII? `scan` finds them
for you. It reads **field names and types only** (never a single row), matches them
against an India-first catalog (Aadhaar, PAN, GST, UPI, phone, email, special-category…),
and suggests an anonymize strategy for each. Output is **advisory** - you review it, then
paste. Zero-egress and zero-dependency.

**Django** - scan your models and get pasteable `@pii(...)` blocks:
```bash
python manage.py dpdp_scan --format python        # or: text (default) | json
# or, without a manage.py:
dpdpstack scan --django --settings myproject.settings --app accounts --format python
```
```python
# accounts.User
@pii(
    name=null,
    email=null,
    phone=redact(keep_last=4),
    aadhaar_number=hashed(),
)
class User(models.Model):
    ...
```
Re-running tags each field `new` (PII, not declared), `covered` (already declared), or
`drift` (declared, but no longer looks like PII) - so it doubles as an ongoing audit.

**Anything else** - a sample dict, an API payload, a column list:
```python
from dpdpstack import anonymize_fields
from dpdpstack.detect import scan_mapping, suggest_strategies

suggest_strategies(["email", "phone", "pan", "ledger_balance"])
# {'email': <null>, 'phone': <redact>, 'pan': <hashed>}   # 'ledger_balance' ignored

record = {"email": "a@b.com", "phone": "9876543210", "ledger_balance": 500}
clean = anonymize_fields(record, suggest_strategies(record.keys()))
```
```bash
dpdpstack scan --keys email,phone,pan --format python      # comma-separated names
dpdpstack scan --dict sample.json --format python          # keys of a JSON object ('-' = stdin)
```
Bring your own catalog by passing a JSON file of the same shape to `load_catalog(path=...)`.

### Detect PII in values (and classify a breach)
The scanner above reads field *names*; `detect_values` reads *values* / free text -
useful to confirm a column really holds PII, or to fill a breach report's `nature`
field. Aadhaar is checked with the **Verhoeff** checksum and cards with **Luhn**, so
random 12-/16-digit numbers don't false-positive. Local, zero-dependency.
```python
from dpdpstack import detect_values, classify_breach_nature

detect_values("PAN ABCDE1234F, card 4111 1111 1111 1111")
# [ValueMatch(type='PAN', ...), ValueMatch(type='Payment Card', ...)]

classify_breach_nature("leaked rows: asha@bank.in, Aadhaar 2341 2341 2346, plus medical records")
# ['Email Address', 'Aadhaar Number', 'Health Data']   # for a Rule 7 breach report
```

## Lint your retention policies (DPDP)
`lint` statically checks a `RetentionPolicy` for compliance smells - a legal hold with
no recorded basis, a hold that will hard-delete a regulated row, a basis cited without a
hold period, retention far past what's justified - each tied to a DPDP citation. Offline
and advisory.
```python
from dpdpstack import RetentionPolicy, Action, lint_policy

lint_policy(RetentionPolicy(purpose="kyc", legal_hold_days=1825, action=Action.DELETE))
# [ERROR E001: ... no legal_basis recorded ...  [DPDP Rules, 2025 - Rule 8],
#  WARNING W001: ... action=delete will hard-delete ... consider action=anonymize ...]
```
From the shell (exit code is non-zero if any **error** is found, so it drops into CI):
```bash
dpdpstack lint --presets                                   # the built-in presets are clean
dpdpstack lint --purpose kyc --legal-hold-days 1825 --action delete   # E001 + W001
```
`dpdpstack.rules` also exposes `DPDP_RULES` and `STATUTORY_HOLDS` (RBI/PMLA/CERT-In/
Companies Act) as a citable reference.

`score_policies(...)` rolls the findings into a graded **readiness report** - a deterministic
0-100 score, letter grade, and tier across all your policies (great for a dashboard or an
onboarding report):
```python
from dpdpstack import score_policies, rbi_kyc, pmla

score_policies([rbi_kyc(), pmla()]).summary
# '100/100 (A+, exemplary) across 2 policies: 2 clean, 0 errors, 0 warnings.'
```
```bash
dpdpstack lint --presets --score      # ... Readiness: 100/100 (A+, exemplary) across 5 policies …
```

## Retention-safe audit + offline verification
The audit log is hash-chained, so any change breaks `verify()`. But a *retention* log
must be prunable - and a pruned chain no longer starts at sequence 1, which would break
verification. **Checkpoints** fix that: snapshot a run of entries into an immutable,
self-chaining `Checkpoint`, then prune; verification anchors to the checkpoint instead
of the genesis.
```python
log = AuditLog(JsonlAuditStore("audit.jsonl"))
# ... record events ...
cp = log.checkpoint(through_sequence=1000)   # immutable snapshot (persist it)
log.prune_through(1000)                       # drop the archived entries

log.verify_report([cp])      # VerifyResult(ok=True, checked=…, anchored_at=1000)
log.verify_report()          # ok=False, first_error_sequence pinpoints any tampering
```
An auditor can verify a chain straight from storage - **no backend, no API to trust**:
```bash
dpdpstack verify-chain audit.jsonl --checkpoints cp.jsonl
# OK - verified 2400 entries (anchored at #1000).
#  (exits non-zero and names the broken entry if the chain was tampered with)
```

## Crypto-shred PII in the audit log (optional, `[crypto]`)
The chain normally holds no PII (`subject` is an opaque ref). When you must record PII
*inside* an entry, `seal` it: the PII is encrypted into an opaque token that the entry
hash covers. Verification runs on the ciphertext, so you can later **destroy the key**
(right-to-erasure) - the payload becomes unreadable while the chain still verifies.
```python
from dpdpstack.sealing import generate_seal_key

key = generate_seal_key()                       # keep secret; deleting it shreds the data
e = log.record("evidence", subject="user_42",
               private={"aadhaar": "2341 2341 2346"}, seal_key=key)
AuditLog.open_sealed(e, key)                     # -> {"aadhaar": "…"}  (with the key)
log.verify()                                     # True — even after the key is destroyed
```
**Key rotation** (zero-downtime): pass a *list* of keys, newest first. New entries seal with
the first key; unsealing tries all, so older-key entries still open. The ciphertext is part
of the entry hash, so chain entries are never re-encrypted — keep an old key around to read
old entries, and retire it once they've been pruned or shredded.
```python
new = generate_seal_key()
log.record("evidence", subject="user_43", private={…}, seal_key=[new, key])  # seals with `new`
AuditLog.open_sealed(e, [new, key])              # still opens the old-key entry
```

## Push evidence to the hosted vault (optional)
Keep everything local, or push your tamper-evident chain to a vault (e.g. getdpdp.net)
for an independent, server-timestamped, counter-signed copy. The push carries
**evidence only** - opaque refs, event types, and hashes (plus any sealed ciphertext) -
**never PII**, so it stays zero-egress. It's zero-dependency (stdlib), idempotent at the
vault (re-pushing is a no-op), and the fire-and-forget variant never blocks or raises in
your request path.
```python
from dpdpstack import EvidenceClient

vault = EvidenceClient("https://getdpdp.net/api/v1", api_key="dpdp_sk_…", source="api")

vault.push(log)                # synchronous: -> {"stored": N, "chain_verified": True, …}
vault.push_background(log)     # fire-and-forget: returns immediately, errors swallowed
```

## Signed certificates (optional, `[crypto]`)
The hash-chained Certificate of Erasure is tamper-evident on its own; add an RS256
signature so anyone can verify it with your public key (and you can't forge it):
```python
from dpdpstack import issue_certificate
from dpdpstack.signing import generate_keypair, issue_signed_certificate, verify_certificate

private_pem, public_pem = generate_keypair()      # keep private secret; publish public
cert = issue_certificate(engine.audit, "user_42", "marketing")
token = issue_signed_certificate(cert, private_pem)   # compact JWT
verify_certificate(token, public_pem)                 # -> {"valid": True, ...}
```
This is the basis for the **hosted, counter-signed** certificate at getdpdp.net - a
regulator/auditor verifies it independently, and the issuer cannot fake it.

### CLI (verify a certificate offline)
With the `[crypto]` extra installed, an auditor can verify a Certificate of Erasure
from the shell - no code, just the cert and your public key:
```bash
dpdpstack keygen --out-dir ./keys                       # one-time: make a signing keypair
dpdpstack verify cert.jwt --public-key ./keys/cert_public.pem
# VALID - signature verified.
#   subject: user_42 · status: erased (delete) · chain ok: True
```
(`python -m dpdpstack verify ...` works too.)

## Presets for the common conflicts
`rbi_kyc()` (5-yr hold, anonymize) · `pmla()` · `cert_in_logs()` (180-day log hold) · `companies_act()` (8-yr books of account) · `third_schedule()` (DPDP specified period). Or build your own `RetentionPolicy(retention_days=…, legal_hold_days=…, legal_basis="…", action=…)`.

## What's in the box
| Module | What |
|---|---|
| `policies` | `RetentionPolicy` + RBI/PMLA/CERT-In/Companies-Act/Third-Schedule presets |
| `anonymize` | `null` / `hashed` / `redact` / `constant` field strategies |
| `audit` | hash-chained log + checkpoints/pruning + `verify_report`; store (in-memory, JSONL, Django, SQLAlchemy) |
| `erasure` | `ErasureEngine` - legal-hold-aware resolve + your `executor` |
| `certificate` | `issue_certificate()` → verifiable Certificate of Erasure |
| `detect` | PII discovery - schema (`scan_mapping`) + values (`detect_values`, `classify_breach_nature`) |
| `rules` | DPDP knowledge pack + `lint_policy()` / `dpdpstack lint` |
| `vault` | `EvidenceClient` - push the chain to a hosted vault (evidence only, fire-and-forget) |
| `sealing` *(extra)* | crypto-shred PII in the chain - `seal` / `unseal` / `AuditLog.open_sealed` |
| `signing` *(extra)* | RS256-sign/verify a certificate - `pip install dpdpstack-python-sdk[crypto]` |
| `contrib.django` | model-backed audit store + `erase_instance()` + `@pii(...)` + `dpdp_scan` |
| `contrib.sqlalchemy` *(extra)* | the same for any SQLAlchemy app (FastAPI/Flask/…) |

CLI: `dpdpstack scan` · `lint` · `verify-chain` · `verify` · `keygen` (`python -m dpdpstack …`).

## Status & scope
Alpha (0.6). The core is dependency-free and framework-agnostic; Django and SQLAlchemy adapters ship today. Hosted/managed version (dashboard, cross-system fan-out, certificate vault): [**getdpdp.net**](https://getdpdp.net/).

dpdpstack is tooling, not legal advice; you remain the Data Fiduciary. MIT licensed.
