Metadata-Version: 2.4
Name: expunct
Version: 0.1.0
Summary: Python SDK for the Expunct API
Project-URL: repository, https://github.com/expunct/python-sdk
Author: Expunct
License-Expression: MIT
Requires-Python: >=3.10
Requires-Dist: httpx>=0.27
Requires-Dist: pydantic>=2.0
Provides-Extra: dev
Requires-Dist: mypy>=1.11; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest-httpx>=0.35; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Description-Content-Type: text/markdown

# Expunct Python SDK

Privacy infrastructure for modern applications. Detect and redact PII, secrets, and sensitive data before it reaches AI, logs, or external APIs.

[![PyPI version](https://badge.fury.io/py/expunct.svg)](https://pypi.org/project/expunct/)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

## Installation

```bash
pip install expunct
```

Get your API key at [expunct.ai](https://expunct.ai) — free tier includes 1M tokens/month, no credit card required.

## Quick Start

```python
from expunct import Expunct

client = Expunct(api_key="your-api-key")
redacted = client.sanitize_text("Alice Johnson's email is alice@example.com and SSN is 219-09-9999.")
print(redacted)
# Output: PERSON_1's email is EMAIL_ADDRESS_1 and SSN is US_SSN_1.
```

## Usage

### Text redaction (sync)

```python
from expunct import Expunct

client = Expunct(api_key="your-api-key")

redacted = client.sanitize_text("Call Bob at 415-555-0100 or bob@example.com")
print(redacted)
# Call PERSON_1 at PHONE_NUMBER_1 or EMAIL_ADDRESS_1
```

### Text redaction (async)

```python
import asyncio
from expunct import AsyncExpunct

async def main():
    async with AsyncExpunct(api_key="your-api-key") as client:
        redacted = await client.sanitize_text("Call Bob at 415-555-0100 or bob@example.com")
        print(redacted)

asyncio.run(main())
```

### File redaction (PDF, DOCX, images, audio)

```python
from expunct import Expunct

client = Expunct(api_key="your-api-key")

# Pass a file path — returns redacted bytes
redacted_bytes = client.sanitize_file("contract.pdf")

# Save directly to disk
client.sanitize_file("contract.pdf", dest="contract_redacted.pdf")

# Pass a file-like object
with open("invoice.docx", "rb") as f:
    redacted_bytes = client.sanitize_file(f)
```

### URI redaction (cloud storage)

Submit a file hosted in cloud storage (S3, GCS, Azure Blob) for redaction. The optional `output_uri` controls where the redacted file is written; if omitted the result is available via `jobs.download()`.

```python
from expunct import Expunct

client = Expunct(api_key="your-api-key")

job = client.sanitize_uri(
    "s3://my-bucket/reports/q1.pdf",
    output_uri="s3://my-bucket/reports/q1_redacted.pdf",
)
print(job.status)           # "completed"
print(job.findings_count)   # number of PII items found
```

### Batch URI redaction

Enqueue multiple files in one call via the lower-level `redact.batch()` method, then poll the batch status:

```python
from expunct import Expunct

client = Expunct(api_key="your-api-key")

batch = client.redact.batch(
    input_uris=[
        "s3://my-bucket/docs/file1.pdf",
        "s3://my-bucket/docs/file2.pdf",
    ],
    language="en",
)
print(batch.id, batch.total_jobs)

# Poll progress
status = client.batch.get(batch.id)
print(status.completed_jobs, status.failed_jobs)
```

### Environment variable

Set `EXPUNCT_API_KEY` to avoid passing the key in code. The client reads it automatically when no `api_key` argument is provided — or you can read it yourself:

```python
import os
from expunct import Expunct

client = Expunct(api_key=os.environ["EXPUNCT_API_KEY"])
```

### Custom policy

Policies let you control which entity types are detected, the redaction method, confidence thresholds, and more. Create a policy once and reference it by ID on every job.

```python
from expunct import Expunct, PolicyCreate

client = Expunct(api_key="your-api-key")

# Create a policy that only redacts PII and uses pseudonymization
policy = client.policies.create(PolicyCreate(
    name="pii-only-pseudonymize",
    pii_categories=["PII"],
    redaction_method="pseudonymization",
    confidence_threshold=0.7,
))

# Use the policy when uploading a file
job = client.redact.file("report.pdf", policy_id=policy.id)
completed = client.wait_for_job(job.id)
redacted_bytes = client.jobs.download(completed.id)
```

### Inspecting findings

Every completed job exposes the PII entities that were found:

```python
from expunct import Expunct

client = Expunct(api_key="your-api-key")

redacted_bytes = client.sanitize_file("form.pdf")

# Re-fetch job detail to inspect findings
jobs = client.jobs.list(page=1, page_size=1)
detail = client.jobs.get(jobs.jobs[0].id)

for finding in detail.findings:
    print(finding.entity_type, finding.confidence, finding.entity_value)
```

### Error handling

```python
from expunct import Expunct, AuthenticationError, RateLimitError, PollingTimeoutError

client = Expunct(api_key="your-api-key")

try:
    redacted = client.sanitize_text("Alice, SSN 219-09-9999")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except PollingTimeoutError as e:
    print(f"Job {e.job_id} timed out after {e.timeout}s")
```

### Context manager (sync)

```python
from expunct import Expunct

with Expunct(api_key="your-api-key") as client:
    redacted = client.sanitize_text("John Smith, DOB 01/01/1980")
```

## Client reference

### `Expunct` / `AsyncExpunct`

| Parameter | Type | Default | Description |
|---|---|---|---|
| `api_key` | `str` | required | Your Expunct API key |
| `base_url` | `str` | `https://api.expunct.ai` | Override for self-hosted or staging |
| `tenant_id` | `str \| None` | `None` | Multi-tenant isolation header |
| `timeout` | `float` | `30.0` | Per-request timeout in seconds |
| `max_retries` | `int` | `3` | Automatic retries on transient errors |

### Convenience methods

| Method | Returns | Description |
|---|---|---|
| `sanitize_text(text, *, language)` | `str` | Redact text in one call (upload → poll → decode) |
| `sanitize_file(file, *, language, dest)` | `bytes` | Upload a file, poll, return redacted bytes |
| `sanitize_uri(input_uri, *, language, output_uri)` | `JobDetailResponse` | Submit a URI, poll, return completed job |
| `wait_for_job(job_id, *, interval, timeout)` | `JobDetailResponse` | Poll a job until it completes or times out |

### Resource methods

#### `client.redact`

| Method | Returns | Description |
|---|---|---|
| `redact.file(file, *, config, language, policy_id)` | `JobResponse` | Upload a file and enqueue a redaction job |
| `redact.uri(input_uri, *, output_uri, config, language, metadata)` | `JobResponse` | Submit a cloud URI for redaction |
| `redact.batch(input_uris, *, config, language, metadata)` | `BatchJobResponse` | Submit multiple URIs as a batch |

#### `client.jobs`

| Method | Returns | Description |
|---|---|---|
| `jobs.list(*, page, page_size, status)` | `JobListResponse` | List jobs with optional status filter |
| `jobs.get(job_id)` | `JobDetailResponse` | Get job detail including findings |
| `jobs.report(job_id)` | `dict` | Get full structured report for a job |
| `jobs.download(job_id, *, dest)` | `bytes` | Download redacted output; optionally save to `dest` |

#### `client.policies`

| Method | Returns | Description |
|---|---|---|
| `policies.list()` | `list[PolicyResponse]` | List all policies |
| `policies.create(policy)` | `PolicyResponse` | Create a new policy |
| `policies.get(policy_id)` | `PolicyResponse` | Fetch a policy by ID |
| `policies.update(policy_id, policy)` | `PolicyResponse` | Update a policy |
| `policies.delete(policy_id)` | `None` | Delete a policy |

#### `client.batch`

| Method | Returns | Description |
|---|---|---|
| `batch.get(batch_id)` | `BatchJobResponse` | Get status of a batch job |

#### `client.api_keys`

| Method | Returns | Description |
|---|---|---|
| `api_keys.list()` | `list[ApiKeyResponse]` | List API keys for your account |
| `api_keys.create(key)` | `ApiKeyCreateResponse` | Create a new API key |
| `api_keys.revoke(key_id)` | `dict` | Revoke an API key |

#### `client.audit`

| Method | Returns | Description |
|---|---|---|
| `audit.list(*, page, page_size, event_type)` | `AuditListResponse` | List audit log entries |

## Detected Entity Types

Expunct detects the following entity types by default (all categories enabled):

**PII (Personally Identifiable Information)**

| Type | Example |
|---|---|
| `PERSON` | John Smith |
| `EMAIL_ADDRESS` | john@example.com |
| `PHONE_NUMBER` | 415-555-0100 |
| `LOCATION` | San Francisco, CA |
| `DATE_TIME` | January 1, 1990 |
| `NRP` | American, French (nationalities, religions, political groups) |
| `ORGANIZATION` | Acme Corp |
| `URL` | https://example.com |
| `IP_ADDRESS` | 192.168.1.1 |
| `US_DRIVER_LICENSE` | D1234567 |
| `US_PASSPORT` | 123456789 |
| `US_ITIN` | 900-70-0000 |

**PCI (Payment Card Industry)**

| Type | Example |
|---|---|
| `CREDIT_CARD` | 4111 1111 1111 1111 |
| `US_BANK_NUMBER` | 123456789 |
| `IBAN_CODE` | GB29NWBK60161331926819 |
| `CRYPTO` | 1BoatSLRHtKNngkdXEeobR76b53LETtpyT |
| `CVV` | 123 |
| `EXPIRY_DATE` | 12/26 |
| `CARD_HOLDER_NAME` | J. Smith |
| `PIN_NUMBER` | 1234 |
| `ACCOUNT_NUMBER` | 000123456789 |

**PHI (Protected Health Information)**

| Type | Example |
|---|---|
| `US_SSN` | 219-09-9999 |
| `MEDICAL_LICENSE` | A1234567 |

You can restrict detection to specific types using a `RedactConfig` or by setting `pii_types` on a policy:

```python
from expunct import Expunct, RedactConfig

client = Expunct(api_key="your-api-key")

config = RedactConfig(
    pii_types=["PERSON", "EMAIL_ADDRESS", "US_SSN"],
    redaction_method="blur",
    confidence_threshold=0.6,
)
job = client.redact.file("document.pdf", config=config.model_dump())
```

## Exceptions

| Exception | Raised when |
|---|---|
| `AuthenticationError` | API key is invalid or expired (401/403) |
| `NotFoundError` | Job or resource not found (404) |
| `ValidationError` | Request payload is invalid (422) |
| `RateLimitError` | Rate limit exceeded after retries (429) |
| `PollingTimeoutError` | `wait_for_job` exceeded the timeout |
| `ApiError` | Base class for all SDK errors |

## Links

- [Documentation](https://docs.expunct.ai)
- [API Reference](https://docs.expunct.ai/api-reference)
- [Sign up free](https://expunct.ai)
- [GitHub](https://github.com/expunct/python-sdk)

## License

MIT
