Metadata-Version: 2.4
Name: protegrity-anonymization-sdk
Version: 2.0.0
Summary: Python SDK for Anonymization REST API
Project-URL: Homepage, https://www.protegrity.com/capabilities/data-anonymization-for-ai
Project-URL: Documentation, https://docs.protegrity.com/
Author-email: Protegrity <info@protegrity.com>
License: MIT
Keywords: anonymization,api-client,privacy,sdk
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <3.14,>=3.11
Requires-Dist: httpx>=0.28.1
Requires-Dist: pandas>=2.2.0
Provides-Extra: parquet
Requires-Dist: pyarrow>=14.0; extra == 'parquet'
Description-Content-Type: text/markdown

# Anonymization SDK

Python client library for the Protegrity Anonymization server

## Requirements

**Running Anonymization Server**: This SDK requires a running Anonymization server.

## Installation

```bash
# Using pip
pip install -e .

# Using uv, by adding dependency
uv sync
```

## Quick Start

```python
from anonymization_sdk import AnonymizationClient, PrivacyModel

with AnonymizationClient(base_url="http://localhost:8000") as client:
    result = client.auto_anonymize(
        data=[
            {"patient_id": "P001", "age": 25, "zipcode": "12345", "disease": "flu"},
            {"patient_id": "P002", "age": 30, "zipcode": "23456", "disease": "cold"},
        ],
        privacy_model=PrivacyModel.K_ANONYMITY,
        k=2,
    )
    print(f"Anonymized {result.anonymization.row_count} records")
```

## Features

- **Simple API**: Easy-to-use Python client for the Anonymization server
- **Privacy Models**: k-anonymity, l-diversity, t-closeness, differential privacy
- **Auto-Detection**: Automatic QI/DI/SA detection
- **Solution Reuse**: `apply_anon()` re-applies a saved solution to new data batches without recomputing
- **Lattice Search**: Opt-in optimal generalization level search (`use_lattice_search=True`)
- **Risk Metrics**: Calculate re-identification risk

#### Detection

- `detect_qi()` — Auto-detect quasi-identifiers
- `generate_config()` — Auto-generate anonymization config

#### Anonymization

- `anonymize()` — Synchronous anonymization (supports `use_lattice_search`, `mlops_config`)
- `apply_anon()` — Apply a saved solution to new data without recomputing
- `submit_job()` — Submit async anonymization job
- `get_job_status()` — Check job status
- `cancel_job()` — Cancel running job
- `auto_anonymize()` — One-step detection + anonymization

#### Risk & Validation

- `calculate_risk()` — Calculate re-identification risk (supports `mlops_config`)
- `validate()` — Validate privacy guarantees
- `measure()` — Measure anonymization quality

#### Differential Privacy

- `dp_compute()` — Compute DP-protected aggregate (mean, sum, variance, histogram)
- `dp_stream_update()` — feed a batch into a streaming session (creates on first call)
- `dp_stream_delete()` — close and delete a streaming session
- `dp_stream_list_sessions()` — list active streaming sessions
- `dp_budget_create()` / `dp_budget_status()` / `dp_budget_delete()`
- `dp_advise_composition()` — advise on epsilon/delta budget for a composition plan

## Privacy Models

| Model | Protection Level | Use Case | Key Feature |
|-------|-----------------|----------|-------------|
| **k-anonymity** | Basic | Hide in groups of k | Each record indistinguishable from k-1 others |
| **l-diversity** | Enhanced | Diverse sensitive values | Prevents homogeneity attacks |
| **t-closeness** | Advanced | Distribution matching | Prevents skewness attacks |
| **Differential Privacy** | Mathematical | Aggregate queries, streaming | Provable ε-privacy guarantees via calibrated noise |

## Example Use Cases

- **Healthcare**: Anonymize patient records for research (HIPAA compliance)
- **Finance**: Share transaction data for analysis (PCI DSS compliance)
- **Marketing**: Publish customer analytics datasets (GDPR compliance)
- **Research**: Share study data with collaborators (IRB approval)

## Lattice Search

By default the server applies **level 1** of every configured hierarchy. Pass `use_lattice_search=True` to find the *shallowest* generalization level combination that satisfies k-anonymity — yielding lower information loss:

```python
result = client.anonymize(
    data=data,
    privacy_model="k-anonymity",
    k=10,
    max_suppression=0.05,
    attributes=[...],
    use_lattice_search=True,
    lattice_strategy="basic",  # basic | with_importance | with_deviation | full
)
```

The `generalization_levels` field in the result shows the actual levels chosen per QI.

## Solution Reuse (`apply_anon`)

Re-apply the exact same solution to new data batches without recomputing hierarchies:

```python
# Step 1: anonymize training batch → solution stored server-side
anon = client.anonymize(data=training_data, privacy_model="k-anonymity", k=5, ...)
job_id = anon.job_id

# Step 2: apply to any new batch instantly
apply_result = client.apply_anon(job_id=job_id, data=new_batch)
print(f"Applied: {apply_result.row_count} rows | Suppressed: {apply_result.suppressed_count}")
```

```python
client = AnonymizationClient(
    base_url="http://localhost:8000",
    mlops_config={
        "postgres_dsn": "postgresql://mlopsuser:mlopspsw@localhost:5432/mlopsdb",
        "experiment_prefix": "my-project",
        "model_name": "patient-records",
        "auto_promote": True,
        "promotion_metric": "combined_loss",
        "promotion_direction": "lower_better",
    },
)
result = client.anonymize(data=data, privacy_model="k-anonymity", k=5, ...)
models = client.list_models()
```

## Documentation

API reference is available via docstrings in `src/anonymization_sdk/`.
Refer to [Online Documentation](https://docs.protegrity.com/)

## Support

- **Issues**: Report bugs and request features via issue tracker
- **Email**: info@protegrity.com

## License

MIT License
