Metadata-Version: 2.4
Name: pleno-dlp
Version: 0.7.0
Summary: Unified DLP scanner for SaaS sources — secret detection (trufflehog, gitleaks, native regex) plus PII detection (pleno-anonymize). Bundles saas-retriever for API-driven content collection: GitHub, GitLab, Bitbucket, Slack, Notion, Confluence, Jira.
Project-URL: Homepage, https://github.com/plenoai/pleno-dlp
Project-URL: Repository, https://github.com/plenoai/pleno-dlp
Project-URL: Issues, https://github.com/plenoai/pleno-dlp/issues
Author-email: pleno <ai@egahika.dev>
License-Expression: AGPL-3.0-or-later
Keywords: anonymize,dlp,gitleaks,pii,saas,scanner,secrets,trufflehog
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Requires-Python: >=3.12
Requires-Dist: httpx>=0.27
Requires-Dist: rich>=13.9
Requires-Dist: typer>=0.12
Provides-Extra: dev
Requires-Dist: mypy>=1.13; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24; extra == 'dev'
Requires-Dist: pytest>=8.3; extra == 'dev'
Requires-Dist: ruff>=0.7; extra == 'dev'
Provides-Extra: pii
Description-Content-Type: text/markdown

# pleno-dlp (Python)

Unified DLP scanner for SaaS content — **secrets** (trufflehog /
gitleaks / native regex) and **PII** (delegating to
[pleno-anonymize](https://github.com/plenoai/pleno-anonymize)). The
SaaS source layer (formerly the standalone
[saas-retriever](https://github.com/plenoai/saas-retriever) package) is
**vendored in-tree from 0.7.0**: `pip install pleno-dlp` pulls one
wheel that exposes both the `pleno-dlp` and the `saas-retriever`
console scripts and lets you `from saas_retriever import …` without any
extra dependency.

The Go binary in this repo (`cmd/pleno-dlp`) remains for filesystem-only
scans; the Python package is the path forward for SaaS.

## Install

```sh
uv tool install pleno-dlp
# or
pipx install pleno-dlp

# Add the PII backend (pulls pleno-anonymize):
uv tool install 'pleno-dlp[pii]'
```

## Usage

```sh
# Secret scan over an entire GitHub org (code + issues + PRs across every repo)
GITHUB_TOKEN=ghp_... pleno-dlp scan github --owner plenoai

# Scan a single repo, only code, with trufflehog verification
pleno-dlp scan github --owner plenoai --repo saas-retriever \
    --resource code --backend trufflehog

# Issue + PR conversations only, PII detection (requires pleno-anonymize)
pleno-dlp scan github --owner plenoai \
    --resource issues --resource prs --backend pii

# SARIF output for GitHub code-scanning ingestion
pleno-dlp scan github --owner plenoai \
    --format sarif > findings.sarif
```

Auth resolution: `--token` → `GITHUB_TOKEN` env var → `gh auth token`.
Anonymous works for public content but is rate-limited to 60 req/h.

## Backends

| Backend | Class | Verifies | System dep |
|---|---|---|---|
| trufflehog | secret | yes (per-detector) | `trufflehog` CLI on PATH |
| gitleaks | secret | no | `gitleaks` CLI on PATH |
| native | secret | no | none — bundled regex (AWS, GitHub PAT, Slack bot, OpenAI, Anthropic) |
| pii | PII | n/a | `pleno-anonymize` (installed via `pleno-dlp[pii]` extra) |

## Connectors

Anything `saas-retriever` provides. Today: **github** with org-wide
enumeration plus per-repo code / issues / PRs (comments and unified
diffs). Slack / Jira / Confluence / Notion / GitLab / Bitbucket land as
standalone API connectors in subsequent saas-retriever releases.

## Release

Tag `py-vX.Y.Z` triggers PyPI trusted publishing via GitHub Actions.
