Metadata-Version: 2.4
Name: traceredact
Version: 0.2.2
Summary: Redact PII and secrets from AI prompts, traces and tool-call arguments before they reach your loggers.
Project-URL: Homepage, https://traceredact.com
Project-URL: Documentation, https://traceredact.com
Project-URL: Repository, https://github.com/traceredact/traceredact
Project-URL: Issues, https://github.com/traceredact/traceredact/issues
Project-URL: Changelog, https://github.com/traceredact/traceredact/blob/main/CHANGELOG.md
Author: traceredact contributors
License: Apache-2.0
License-File: LICENSE
Keywords: dlp,llm,observability,pii,privacy,redaction,secrets
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Logging
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: pydantic>=2.6
Requires-Dist: pyyaml>=6.0
Requires-Dist: typer>=0.12
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.25; extra == 'anthropic'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.2; extra == 'langchain'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Description-Content-Type: text/markdown

<p align="center">
  <a href="https://traceredact.com"><img src="https://traceredact.com/social-preview.png" alt="traceredact" width="640"></a>
</p>

<p align="center">
  <a href="https://pypi.org/project/traceredact/"><img src="https://img.shields.io/pypi/v/traceredact.svg?cache=3" alt="PyPI"></a>
  <a href="https://pypi.org/project/traceredact/"><img src="https://img.shields.io/pypi/pyversions/traceredact.svg?cache=3" alt="Python"></a>
  <a href="https://github.com/traceredact/traceredact/actions/workflows/ci.yml"><img src="https://github.com/traceredact/traceredact/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="./LICENSE"><img src="https://img.shields.io/badge/License-Apache_2.0-blue.svg" alt="License: Apache-2.0"></a>
  <a href="https://traceredact.com"><img src="https://img.shields.io/badge/website-traceredact.com-7c83ff" alt="Website"></a>
</p>

# traceredact

**Redact PII and secrets from AI prompts, agent traces and tool-call arguments
*before* they reach your loggers / observability backend.** — [traceredact.com](https://traceredact.com)

LLM apps log everything — prompts, agent traces, tool-call arguments — into
Langfuse / Helicone / Datadog / your own DB. Customer PII and API keys leak into
those traces. `traceredact` is a small, dependency-light library that detects
and redacts that data deterministically, in-process, before it leaves you.

It is **content-based**: it catches a `sk-…` key or a credit-card number even
when it sits under an innocuous JSON key — not just well-known field names.

> A missed secret is a real incident, so detection is treated as
> safety-critical: bounded (ReDoS-safe) patterns, entropy fallback, Luhn/IBAN
> validation, and adversarial evasion fixtures.

## Install

```bash
pip install traceredact          # or: uv add traceredact
```

## Usage (3 lines)

```python
from traceredact import redact

result = redact({"args": {"email": "a@b.com", "key": "sk-1234567890abcdefABCDEFGH"}})
print(result.value)      # {'args': {'email': '[REDACTED:pii]', 'key': '[REDACTED:secret]'}}
print(result.findings)   # [Finding(detector_id='pii.email', json_path='args.email', ...), ...]
```

`redact()` accepts a string, dict, list, or any nested mix. The input is never
mutated; `result.value` is a redacted copy and `result.findings` lists every hit
with its `detector_id`, `category`, `confidence`, `json_path` and `span`.

### CLI (CI-gateable)

```bash
traceredact scan ./logs/            # report findings as a table; exit 1 if any
traceredact scan trace.json -f json # machine-readable output for CI
traceredact redact trace.json -o redacted.json
```

`scan` exits non-zero when anything is found, so you can gate a CI job on it.

### SDK integrations

```python
from openai import OpenAI
from traceredact.integrations.openai import wrap_openai

client = wrap_openai(OpenAI())   # prompts + completions now redacted in-flight
```

Also: `traceredact.integrations.anthropic.wrap_anthropic(client)` and
`traceredact.integrations.langchain.RedactingCallbackHandler()`.

**Async** clients are supported via `wrap_async_openai` / `wrap_async_anthropic`.

### Streaming

Redact a stream of text deltas without buffering the whole response — a secret
spanning chunk boundaries is still caught (carry-over window):

```python
from traceredact import redact_stream
for piece in redact_stream(token_deltas):   # also: redact_stream_async(...)
    log(piece)

# OpenAI async streams:
from traceredact.integrations.openai import redact_content_stream
async for safe_text in redact_content_stream(await client.chat.completions.create(..., stream=True)):
    ...
```

### Structured objects

pydantic models, dataclasses and attrs instances are traversed automatically
(redacted to dicts). Disable with `Policy(traverse_objects=False)`.

### Encoded payloads (opt-in)

`Policy(decode_payloads=True)` base64-decodes blobs one layer and, if the decoded
text contains a high-confidence secret, redacts the whole blob.

## Policy file (`traceredact.yml`)

Drop a `traceredact.yml` in your repo root (auto-discovered) or pass `--policy`:

```yaml
entropy_threshold: 4.0
min_entropy_len: 20
disabled_detectors:
  - pii.phone
allowlist:
  - "noreply@example.com"
allow_patterns:
  - ".*@example\\.com"
placeholder: "[REDACTED:{category}]"
hash_correlation: false        # set true + hash_key to emit correlation tags
custom_patterns:
  - id: custom.internal_user_id
    category: pii
    regex: "ACME-USR-[0-9]{8}"
    confidence: 0.95
```

See [`traceredact.yml`](./traceredact.yml) in this repo for a fully-commented example.

## Detectors

**Secrets:** `secrets.openai_key`, `secrets.anthropic_key`,
`secrets.aws_access_key`, `secrets.github_token`, `secrets.slack_token`,
`secrets.slack_webhook`, `secrets.discord_webhook`, `secrets.google_api_key`,
`secrets.stripe_key`, `secrets.sendgrid_key`, `secrets.twilio_key`,
`secrets.huggingface_token`, `secrets.npm_token`, `secrets.pypi_token`,
`secrets.azure_storage_key`, `secrets.jwt`, `secrets.private_key`,
`secrets.pgp_private_key`, `secrets.basic_auth_url`, `secrets.bearer_token`,
`secrets.env_assignment`, `secrets.high_entropy`.

**PII:** `pii.email`, `pii.credit_card` (Luhn), `pii.iban` (mod-97),
`pii.ipv4`, `pii.phone`, `pii.us_ssn`.

Secret *pattern* hits are deterministic (confidence `1.0`); fuzzy heuristics
(entropy, phone, IP) carry lower confidence so policy thresholds can gate them.

## Design & safety

- **Deterministic, no data retained.** Pure functions; nothing is stored.
- **Copy, never mutate.** Your objects are untouched.
- **ReDoS-safe.** Cheap literal prefilters gate bounded regexes; no nested
  quantifiers; input length is capped.
- **Fail-closed.** Hash correlation without a key, or exceeding `max_depth`,
  raises rather than silently leaking.

Detectors were hardened against adversarial evasion cases (see
`tests/test_evasion.py`).

## License

Apache-2.0.
