Metadata-Version: 2.4
Name: traceredact
Version: 0.1.2
Summary: Redact PII and secrets from AI prompts, traces and tool-call arguments before they reach your loggers.
Project-URL: Homepage, https://github.com/traceredact/traceredact
Project-URL: Issues, https://github.com/traceredact/traceredact/issues
Author: traceredact contributors
License: Apache-2.0
License-File: LICENSE
Keywords: dlp,llm,observability,pii,privacy,redaction,secrets
Requires-Python: >=3.11
Requires-Dist: pydantic>=2.6
Requires-Dist: pyyaml>=6.0
Requires-Dist: typer>=0.12
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.25; extra == 'anthropic'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.2; extra == 'langchain'
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == 'openai'
Description-Content-Type: text/markdown

# traceredact

[![PyPI](https://img.shields.io/pypi/v/traceredact.svg)](https://pypi.org/project/traceredact/)
[![Python](https://img.shields.io/pypi/pyversions/traceredact.svg)](https://pypi.org/project/traceredact/)
[![CI](https://github.com/traceredact/traceredact/actions/workflows/ci.yml/badge.svg)](https://github.com/traceredact/traceredact/actions/workflows/ci.yml)
[![License: Apache-2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](./LICENSE)

**Redact PII and secrets from AI prompts, agent traces and tool-call arguments
*before* they reach your loggers / observability backend.**

LLM apps log everything — prompts, agent traces, tool-call arguments — into
Langfuse / Helicone / Datadog / your own DB. Customer PII and API keys leak into
those traces. `traceredact` is a small, dependency-light library that detects
and redacts that data deterministically, in-process, before it leaves you.

It is **content-based**: it catches a `sk-…` key or a credit-card number even
when it sits under an innocuous JSON key — not just well-known field names.

> A missed secret is a real incident, so detection is treated as
> safety-critical: bounded (ReDoS-safe) patterns, entropy fallback, Luhn/IBAN
> validation, and adversarial evasion fixtures.

## Install

```bash
pip install traceredact          # or: uv add traceredact
```

## Usage (3 lines)

```python
from traceredact import redact

result = redact({"args": {"email": "a@b.com", "key": "sk-1234567890abcdefABCDEFGH"}})
print(result.value)      # {'args': {'email': '[REDACTED:pii]', 'key': '[REDACTED:secret]'}}
print(result.findings)   # [Finding(detector_id='pii.email', json_path='args.email', ...), ...]
```

`redact()` accepts a string, dict, list, or any nested mix. The input is never
mutated; `result.value` is a redacted copy and `result.findings` lists every hit
with its `detector_id`, `category`, `confidence`, `json_path` and `span`.

### CLI (CI-gateable)

```bash
traceredact scan ./logs/            # report findings as a table; exit 1 if any
traceredact scan trace.json -f json # machine-readable output for CI
traceredact redact trace.json -o redacted.json
```

`scan` exits non-zero when anything is found, so you can gate a CI job on it.

### SDK integrations

```python
from openai import OpenAI
from traceredact.integrations.openai import wrap_openai

client = wrap_openai(OpenAI())   # prompts + completions now redacted in-flight
```

Also: `traceredact.integrations.anthropic.wrap_anthropic(client)` and
`traceredact.integrations.langchain.RedactingCallbackHandler()`.

> **Limitation (MVP):** the wrappers patch the **synchronous, non-streaming**
> `create` call. Outbound prompts are always redacted, but **streamed**
> (`stream=True`) response *content* and **async** clients (`AsyncOpenAI`) are
> not yet redacted on the response side. Don't rely on response redaction for
> streaming until that lands.

## Policy file (`traceredact.yml`)

Drop a `traceredact.yml` in your repo root (auto-discovered) or pass `--policy`:

```yaml
entropy_threshold: 4.0
min_entropy_len: 20
disabled_detectors:
  - pii.phone
allowlist:
  - "noreply@example.com"
allow_patterns:
  - ".*@example\\.com"
placeholder: "[REDACTED:{category}]"
hash_correlation: false        # set true + hash_key to emit correlation tags
custom_patterns:
  - id: custom.internal_user_id
    category: pii
    regex: "ACME-USR-[0-9]{8}"
    confidence: 0.95
```

See [`traceredact.yml`](./traceredact.yml) in this repo for a fully-commented example.

## Detectors

**Secrets:** `secrets.openai_key`, `secrets.aws_access_key`,
`secrets.github_token`, `secrets.slack_token`, `secrets.slack_webhook`,
`secrets.google_api_key`, `secrets.stripe_key`, `secrets.sendgrid_key`,
`secrets.twilio_key`, `secrets.jwt`, `secrets.private_key`,
`secrets.basic_auth_url`, `secrets.env_assignment`, `secrets.high_entropy`.

**PII:** `pii.email`, `pii.credit_card` (Luhn), `pii.iban` (mod-97),
`pii.ipv4`, `pii.phone`.

Secret *pattern* hits are deterministic (confidence `1.0`); fuzzy heuristics
(entropy, phone, IP) carry lower confidence so policy thresholds can gate them.

## Design & safety

- **Deterministic, no data retained.** Pure functions; nothing is stored.
- **Copy, never mutate.** Your objects are untouched.
- **ReDoS-safe.** Cheap literal prefilters gate bounded regexes; no nested
  quantifiers; input length is capped.
- **Fail-closed.** Hash correlation without a key, or exceeding `max_depth`,
  raises rather than silently leaking.

Detectors were hardened against adversarial evasion cases (see
`tests/test_evasion.py`).

## License

Apache-2.0.
