Metadata-Version: 2.1
Name: agentscrub
Version: 1.1.25
Summary: Scrub secrets and credentials from AI coding assistant session logs
License: Apache-2.0
Project-URL: Homepage, https://github.com/ppravdin/agentscrub
Project-URL: Repository, https://github.com/ppravdin/agentscrub
Project-URL: Issues, https://github.com/ppravdin/agentscrub/issues
Keywords: security,privacy,ai,claude,codex,cursor,redact,secrets
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Security
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cryptography >=46.0.7
Requires-Dist: rich >=13.0.0
Requires-Dist: markdown-it-py >=3.0.0
Requires-Dist: mdurl >=0.1.0
Requires-Dist: pygments >=2.15.0
Requires-Dist: cffi >=1.15.0 ; platform_python_implementation != "PyPy"
Requires-Dist: pycparser >=2.21 ; platform_python_implementation != "PyPy"
Requires-Dist: typing-extensions >=4.0.0 ; python_version < "3.11"

# agentscrub

![agentscrub scans local AI agent logs for leaked secrets](assets/cover.png)

**Find and redact leaked secrets in local AI coding-agent logs.**

agentscrub is an open-source CLI that runs locally and scans AI coding-agent histories, transcripts, tool-call logs, command traces, caches, and local state files.

AI tools like Claude Code, Codex CLI, Cursor, Gemini CLI, Windsurf, Cline, Continue, and others can store sensitive data locally: pasted API keys, `.env` contents, database URLs, JWTs, OAuth tokens, cloud keys, and shell output. Malware, rogue extensions, compromised packages, or anyone with local machine access can scan those logs for secrets. agentscrub reports masked findings, creates backups, and redacts leaked copies after confirmation.

Example Claude Code history before cleanup (demo secrets):

```text
user: I pasted the staging env for the deploy:
      DATABASE_URL=postgres://app:xK9mP2nL5qR8@db.internal:5432/app
      NPM_TOKEN=npm_A4bC8dEfG2hIjKlMnOpQrSt5UvWxYz3456

Claude Code: The failing request used:
             Authorization: Bearer eyJhbGciOiJIUzI1NiJ9.eyJpZCI6MX0.SflKxwRJSMeK
```

After `agentscrub run`:

```text
user: I pasted the staging env for the deploy:
      DATABASE_URL=[REDACTED]
      NPM_TOKEN=[REDACTED]

Claude Code: The failing request used:
             Authorization: Bearer [REDACTED]
```

https://github.com/user-attachments/assets/9a770a0c-aaa1-42cd-aca8-5e7421c163ea

Set up daily cleaning and never worry again about the fact that you shared some secret with your AI agent:

```bash
pipx install agentscrub
agentscrub schedule install   # daily cron job at 03:00, backup before every run
```

## Quick start

```bash
# Install
pip install agentscrub

# Or, recommended for CLI tools
pipx install agentscrub

# Read-only audit. Writes nothing.
# If scanners are missing, agentscrub offers to install them automatically.
agentscrub scan

# Redact in place. Asks for confirmation and takes a backup first.
agentscrub run

# Optional: backup + redact daily at 03:00
agentscrub schedule install
agentscrub schedule status
```

## Safety model

- `scan` is **read-only**. It never modifies a file. Use it to see what's exposed.
- `run` writes an **encrypted timestamped backup** of the files it may change before touching anything. Restore with `agentscrub rollback`.
- **Live login and config files are preserved by design.** Files like `~/.claude/.credentials.json`, `~/.codex/auth.json`, `~/.gemini/oauth_creds.json`, and agent config files are scanned and reported but never modified. [Full list below](#live-login--config-files-preserved-scanned-reported-never-modified).
- **Raw secrets are never printed in reports.** Each match gets a stable proof hash so you can correlate the same secret across files without exposing it.
- All scanners run **locally**. Nothing leaves your machine.

## How it works

```mermaid
flowchart TD
    A[Agent dirs<br/>~/.claude, ~/.codex, ~/.cursor, ...] --> B[3 scanners find secrets]
    B --> C{Live login<br/>or config file?}
    C -->|Yes| D[Reported<br/>Never modified]
    C -->|No| E[Backup +<br/>Redact in place]
```

## What it covers

agentscrub detects supported tools automatically. No config file is required.
Each row lists every folder agentscrub recognises across Linux, macOS, and Windows; the first one
that exists on your machine is scanned. Plain-text logs, JSONL sessions, and
JSON state files are scrubbed in place; SQLite databases (`.sqlite`, `.db`,
`.vscdb`) are scrubbed via SQL UPDATE on text columns containing detected
secrets.

| Tool | Where session/log data lives | Notes |
|---|---|---|
| Claude Code | `~/.claude/` | JSONL sessions, file-history snapshots, project trees |
| OpenAI Codex CLI | `~/.codex/` | `sessions/`, `history.jsonl`, `logs_*.sqlite`, `state_*.sqlite` |
| Cursor (CLI/IDE) | `~/.cursor/` | `projects/` (IDE chats), `chats/` (CLI), `acp-sessions/`, `logs/` |
| Cursor (server) | `~/.cursor-server/` | remote-dev / SSH server-side trees |
| Cursor (desktop) | `~/Library/Application Support/Cursor/User/workspaceStorage/`, `~/.config/Cursor/User/workspaceStorage/`, `~/AppData/Roaming/Cursor/User/workspaceStorage/` | chats live in `state.vscdb` (SQLite) |
| Google Antigravity | `~/.antigravity-server/` | server-side IDE state |
| Windsurf | `~/.codeium/windsurf/` (XDG canonical), `~/.config/Codeium/Windsurf/`, `~/AppData/Roaming/Codeium/Windsurf/`, `~/.windsurf/` | Cascade conversation history |
| Windsurf (server) | `~/.windsurf-server/` | remote-dev / SSH server-side trees |
| Windsurf (desktop) | `~/Library/Application Support/Windsurf/User/workspaceStorage/`, `~/.config/Windsurf/User/workspaceStorage/`, `~/AppData/Roaming/Windsurf/User/workspaceStorage/` | desktop IDE workspaceStorage |
| Gemini CLI | `~/.gemini/` | `tmp/<project_hash>/chats/`, plus the Antigravity `brain/`, `skills/`, `commands/` trees |
| Zed AI | `~/.local/share/zed/`, `~/Library/Application Support/Zed/`, `~/AppData/Roaming/Zed/`, legacy `~/.config/zed/conversations/` | conversation history in `threads/threads.db` (SQLite) |
| OpenCode | `~/.local/share/opencode/` (state, sessions) and `~/.config/opencode/` (config) | state/session data plus global config |
| Crush (Charm) | `~/.local/share/crush/` (state, logs) and `~/.config/crush/` (config) | per-workspace `.crush/` state, `crush.log` |
| Cline | VS Code `globalStorage/saoudrizwan.claude-dev/` (cross-OS) **or** `~/.cline/data/` (CLI mode) | `tasks/<id>/`, `state/`, `checkpoints/` |
| GitHub Copilot Chat | `Code/User/workspaceStorage/*/GitHub.copilot-chat/` (cross-OS, scoped to the Copilot extension only) | `chatSessions/`, `transcripts/`, plus `state.vscdb` chat data |
| Aider | `~/.aider/` | repo-local `.aider.input.history` / `.aider.chat.history.md` are out of scope. Pass them with `--also <path>` |
| Continue | `~/.continue/` | CLI sessions in `~/.continue/sessions/` |

### Live login & config files preserved (scanned, reported, **never modified**)

`agentscrub run` will not write to any of the following. They're the live
credentials your agent needs to keep working. They're still scanned and any
matched patterns are reported, so you can review them by hand if needed.

| Tool | Preserved file(s) |
|---|---|
| Claude Code | `~/.claude/.credentials.json`, `~/.claude/settings.json`, `~/.claude.json` |
| Codex CLI | `~/.codex/auth.json`, `~/.codex/.credentials.json`, `~/.codex/config.toml` |
| Cursor | `~/.cursor/mcp.json` |
| Windsurf | `~/.codeium/windsurf/mcp_config.json`, `~/.codeium/mcp_config.json`, `~/.config/Codeium/Windsurf/mcp_config.json`, `~/.windsurf/mcp.json`, `~/.windsurf/mcp_config.json` |
| Gemini CLI | `~/.gemini/oauth_creds.json`, `~/.gemini/mcp-oauth-tokens.json`, `~/.gemini/settings.json`, `~/.gemini/google_accounts.json`, `~/.gemini/trustedFolders.json`, `~/.gemini/installation_id`, `~/.gemini/user_id`, `~/.gemini/antigravity/mcp_config.json` |
| OpenCode | `~/.local/share/opencode/auth.json`, `~/.local/share/opencode/mcp-auth.json`, `~/.config/opencode/opencode.{json,jsonc}`, `~/.config/opencode/tui.{json,jsonc}` |
| Crush | `~/.local/share/crush/mcp.json`, `~/.local/share/crush/crush.json`, `~/.config/crush/crush.json` |
| Aider | `~/.aider.conf.yml` |
| Continue | `~/.continue/config.yaml`, `~/.continue/config.json`, `~/.continue/config.ts`, `~/.continue/.env` |
| Cline (VS Code) | `<globalStorage>/saoudrizwan.claude-dev/settings/cline_mcp_settings.json`, `…/secrets.json` |
| Cline (CLI) | `~/.cline/data/settings/cline_mcp_settings.json`, `~/.cline/data/secrets.json`, `~/.cline/data/globalState.json` |
| Generic | everything under `~/.mcp-auth/` |

The goal is to remove leaked copies from logs, histories, and caches without
breaking agent logins or runtime configuration.

Each scan or run writes one masked audit to `~/.agentscrub/logs/`:

- `scan-YYYYMMDD-HHMMSS.txt`: complete file-by-file audit with detected pattern type,
  hit count, and proof hash for every affected file.

Old reports are rotated automatically: agentscrub keeps the newest 30 scan
audits and newest 30 cron stdout logs, and removes legacy summary reports.

Raw credentials are never printed in reports. Proof hashes let you recognize the
same secret across files without exposing the secret itself.

## How it finds secrets

Three open-source scanners run locally, in parallel; agentscrub merges and deduplicates their findings:

| Tool | Finds |
|---|---|
| **[gitleaks](https://github.com/gitleaks/gitleaks)** | JWTs, generic API keys, npm/GitHub tokens |
| **[TruffleHog](https://github.com/trufflesecurity/trufflehog)** | Postgres URIs, GCP/AWS keys, Dockerhub, OAuth, Stripe, Groq, and dozens more |
| **[Titus](https://github.com/praetorian-inc/titus)** (NoseyParker successor) | Username/password pairs, connection URIs, PostHog, LinkedIn, hundreds of generic rules |

JSON lines are parsed and secrets are replaced inside string values only, preserving file structure even when secrets contain `"` or `{}` characters. SQLite databases are updated via SQL on text columns containing detected secrets.

## Install

**Requirements:** Python ≥ 3.10 and `rsync`

```bash
# Familiar Python install
pip install agentscrub

# Recommended CLI install: isolated environment, clean upgrades/uninstall
pipx install agentscrub
```

On Linux, `pip install --user agentscrub` may put the command in
`~/.local/bin`; add that directory to `PATH` if your shell cannot find
`agentscrub`.

### Detection tools

agentscrub uses three detector binaries: gitleaks, TruffleHog, and Titus.

You do **not** install them by hand. On first `agentscrub scan` or
`agentscrub run`, missing detectors are offered automatically:

```text
2 detector(s) missing: TruffleHog, Titus
Install official release binaries to ~/.agentscrub/bin?
Continue? [Y/n]
```

Downloaded binaries are SHA256-verified against each tool's official release
checksums and stored in:

```text
~/.agentscrub/bin/
```

If a detector is missing, `agentscrub scan` and `agentscrub run` show the
installer prompt before scanning.

## Usage

```bash
# See what's exposed. No writes.
agentscrub scan

# Redact (asks for confirmation, creates backup first)
agentscrub run

# Non-interactive (for cron / CI)
agentscrub run --yes

# Restore a previous backup
agentscrub rollback

# Set up daily 3am cron job, then verify the installed user crontab entry
agentscrub schedule install
agentscrub schedule status
crontab -l | grep agentscrub

# Remove the scheduled job
agentscrub schedule uninstall

# Scan an extra directory not in the auto-detect list
agentscrub run --also ~/my-other-ai-tool

# Limit the run to specific tools. Repeatable or comma-separated.
agentscrub run --only claude
agentscrub run --only claude,codex
agentscrub --list-tools             # show every known tool ID

# Keep more backups (default: 3)
agentscrub run --max-backups 10
```

## Backup & rollback

Every live run creates an encrypted backup of the files it may change before
touching anything. The encryption key is generated once at `~/.agentscrub/key`
and stored with `0600` permissions so scheduled runs work without prompts.

```
~/.agentscrub/backups/
  claude/
    20260429-030000.partial.tar.gz.enc    ← newest
    20260428-030000.partial.tar.gz.enc
    20260427-030000.partial.tar.gz.enc
  codex/
    20260429-030000.partial.tar.gz.enc
    ...
```

Oldest backups are rotated out automatically (default: keep 3 per tool). Old
plaintext backup folders from earlier versions remain restorable and are rotated
out normally as newer encrypted backups are created.

To restore:

```bash
agentscrub rollback

# Available restore points
#   1  2026-04-29 03:00   3 tools  (today)  47M
#      Claude Code, OpenAI Codex CLI, OpenCode config
#   2  2026-04-28 03:00   9 tools  (yesterday)  348M
#      Claude Code, Cursor, Cursor (desktop), Gemini CLI, +5 more
#
# Restore point # (or q to quit): 1
```

Use `agentscrub rollback --by-tool` for advanced single-tool restores.

## Scheduled cleanup

`agentscrub schedule install` adds one entry to the current user's crontab:

```text
0 3 * * * agentscrub run --yes ...
```

The command verifies the entry after writing it, so a green success means
`crontab -l` can see it. Scheduled runs write stdout/stderr logs to
`~/.agentscrub/logs/YYYYMMDD.log`; old scan audits and cron logs are rotated
automatically.

## What it does NOT catch

| Gap | Why |
|---|---|
| Plain prose passwords (`my password is hunter2`) | No pattern; indistinguishable from normal text |
| Short secrets < 8 chars | Below minimum length for all three tools |
| Secrets in binary files | Skipped by design |
| PII (names, phones, addresses) | Out of scope; agentscrub targets credentials and secret-like patterns |

## Adding a new AI tool

Edit `src/agentscrub/discover.py` → `_REGISTRY`:

```python
dict(
    tool="my-tool",
    display="My AI Tool",
    dirs=["~/.my-tool/sessions"],
    exclude_dirs={"cache"},
    exclude_files={"credentials.json"},
),
```

Open a PR. Contributions are welcome.

## Upgrade / uninstall

```bash
pipx upgrade agentscrub
pipx uninstall agentscrub
```

## License

Apache-2.0
